opencv - Image detection features: SIFT, HISTOGRAM and EGDE -
i working on developing object classifier using 3 different features i.e sift, histogram , egde.
however these 3 features have different dimensional vector e.g. sift = 128 dimension. hist = 256.
now these features cannot concatenated once vector due different sizes. planning not sure if going correct way this:
for each features train classifier separately , apply classification separately 3 different features , count majority , declare image majority votes.
do think correct way?
there several ways classification results take account multiple features. have suggested 1 possibility instead of combining features train multiple classifiers , through protocol, arrive @ consensus between them. typically under field of ensemble methods. try googling boosting, random forests more details on how combine classifiers.
however, not true feature vectors cannot concatenated because have different dimensions. can still concatenate features huge vector. e.g., joining sift , hist features give vector of 384 dimensions. depending on classifier use, have normalize entries of vector no 1 feature dominate because construction has larger values.
edit in response comment: appears histogram feature vector describing characteristic of entire object (e.g. color) whereas sift descriptors extracted @ local interest keypoints of object. since number of sift descriptors may vary image image, cannot pass them directly typical classifier take in 1 feature vector per sample wish classify. in such cases, have build codebook (also called visual dictionary) using sift descriptors have extracted many images. use codebook derive single feature vector many sift descriptors extract each image. known "bag of visual words (bow)" model. have single vector "summarizes" sift descriptors, can concatenate histogram form bigger vector. single vector summarizes entire image/(object in image).
for details on how build bag of words codebook , how use derive single feature vector many sift descriptors extracted each image, @ book (free download author's website) http://programmingcomputervision.com/ under chapter "searching images". lot simpler sounds.
roughly, run kmeans cluster sift descriptors many images , take centroids (which vector called "visual word") codebook. e.g. k = 1000 have 1000 visual word codebook. then, each image, create result vector same size k (in case 1000). each element of vector corresponds visual word. then, each sift descriptor extracted image, find closest matching vector in codebook , increment count in corresponding cell in result vector. when done, result vector counts how different visual words appear in image. similar images have similar counts same visual words , hence vector represents images. need "normalize" vector make sure images different number of sift descriptors (and hence total counts) comparable. can simple dividing each entry total count in vector or through more sophisticated measure such tf/idf described in book.
i believe author provide python code on website accompany book. take or experiment them if unsure.
more sophisticated method combining features include multiple kernel learning (mkl). in case, compute different kernel matrices, each using 1 feature. find optimal weights combine kernel matrices , use combined kernel matrix train svm. can find code in shogun machine learning library.
Comments
Post a Comment