Hacker News new | ask | show | jobs
by nostrademons 5856 days ago
Many machine-learning systems get bootstrapped by their implementer sitting at a website clicking "Like" and "Dislike" buttons for a large randomly-chosen sample of possible data.

If this strikes you as incredibly boring, you can farm it out with Amazon Mechanical Turk or other crowdsourcing schemes. You could also do cleverer variants of this, like putting image-recognition or OCR training sets into CAPTCHAs, submitting possible links to Reddit or Digg, or hosting Internet surveys with the questions of interest.

1 comments

but the whole idea is... every one has their own notion of likes and dislikes.... am i missing something here?
That's why recommendation systems are hard. :-)

You could try to identify a population of users whose likes and dislikes are expected to be "similar" to the user in question, though, and then base your training set off them. I believe that's how actual recommendation engines (eg. Amazon, YouTube) work. Of course, then you have to figure out how to identify similar users, which is another hard problem.