| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by apu 5339 days ago

On the other hand, a lot of new research (including possibly ground-breaking theoretical results) are only possible now that we have access to large data.

We might be initially processing the large data using relatively simple techniques, but on the reduced data, we can now run more sophisticated methods that actually work because the underlying data comes from a huge number of samples.

As but one example, in computer vision, the concept of "attributes" -- automatically labeling objects using descriptive words instead of categorical ones, i.e., "this thing is like..." rather than "this thing is..." -- has opened the door to a number of exciting advances. One is the concept of "zero-shot learning": automatically recognizing an object that you've never seen an instance of before simply via a description. For example, one could recognize beavers as "small, four-legged furry rodents with big teeth and a flat tail", without having ever seen a beaver before. The training data for this classifier need not include beavers, but only images which match the individual attributes, not necessarily all in the same image -- small, four-legged, furry, rodent, big teeth, flat tail.

This kind of thing was not really possible before, because there just wasn't enough data to train reliable classifiers for each attribute in any kind of automated way.

Finally, as I alluded to at the beginning, these individual attribute classifiers are often relatively simple algorithms, such as Support Vector Machines (SVMs). Yet, the 2nd-stage algorithms that use the attribute values to do something useful, such as the zero-shot learning application described above, are often much more involved/advanced techniques.