Hacker News new | ask | show | jobs
by Ronsenshi 3661 days ago
For large datasets simple "Bag of Words" approach actually is not that great since for given set of features you have to compare it to the whole vocabulary. More modern approach calls for use of Vocabulary Tree which represents your bag of words. This vocabulary tree significantly reduces amount of matching that has to be done for each individual feature.

[1] http://www.cs.utexas.edu/~grauman/courses/fall2009/papers/ba...