Hacker News new | ask | show | jobs
by morganK 3735 days ago
Would have like to hear at least one concrete exemple of startup actually doing that. Seems a bit theoretical at the moment, as big companies doesn't need to do that thanks to existing datasets, and I've never heard any startups using dozens (hundreds?) of contractors for this kind of job.
7 comments

Netflix used humans to tag movies for their recommendation system.

Source: http://www.theatlantic.com/technology/archive/2014/01/how-ne...

Netflix is not a startup.
At one point, Netflix was a startup.
Yes but it wasn't in 2014 or 2012.
CrowdFlower does AI and ML-focused microtasking, though I have no experience with them. Even large companies need plenty of preprocessing done on their datasets, so it's common to use offshored services companies or divisions to do annotation and cleanup work on corpora before using them as training sets.
In very broad strokes this is how we power many of our API features at Diffbot. We have hundreds of thousands of human-trained web pages amounting to millions of individual elements that have helped to train our system.
Not a start-up and not deep learning (until now I suppose), but this have been done for years in the translation industry.

They feed their automatic systems with the output of the human translator. Every input means less and less manual work that need to be done in the future.

the post office used humans for many years to train OCR models, e.g. zip code readers.

I visited a postal routing facility once in the 90s and saw a long row of metal stationed by about 20 people, 10 to each side. Envelopes passed through on a sort of pneumatic tube-like conveyor, paused in front of a human operator who read a single digit of a zip code, keyed it in and sent the envelope to be read by the next person.

Many, many startups use Amazon Mechanical Turk and/or CrowdFlower for this exact thing.

See http://blog.echen.me/2012/04/25/making-the-most-of-mechanica... for some examples.

hunch