Hacker News new | ask | show | jobs
Ask HN: Addressing cold start problems in recommender systems?
5 points by apurva 5854 days ago
Hi All, I have been working on a recommender engine for a while now and have now stumbled across the cold start problem. The problem here is that whatever data I collect is only an indicator of the likes of the user (for eg., if browsing history is taken as a source, then the basic assumption that people don't browse for what they don't like stands true) So in such a case, any ideas as to how I train the system initially for dislikes?? I do know that the system will gradually tune to the user preferences with continuous feedback, but I would not like the first run to be very erratic either by choosing random dislikes... Any ideas folks?? Any help in the matter is greatly appreciated....
3 comments

Many machine-learning systems get bootstrapped by their implementer sitting at a website clicking "Like" and "Dislike" buttons for a large randomly-chosen sample of possible data.

If this strikes you as incredibly boring, you can farm it out with Amazon Mechanical Turk or other crowdsourcing schemes. You could also do cleverer variants of this, like putting image-recognition or OCR training sets into CAPTCHAs, submitting possible links to Reddit or Digg, or hosting Internet surveys with the questions of interest.

but the whole idea is... every one has their own notion of likes and dislikes.... am i missing something here?
That's why recommendation systems are hard. :-)

You could try to identify a population of users whose likes and dislikes are expected to be "similar" to the user in question, though, and then base your training set off them. I believe that's how actual recommendation engines (eg. Amazon, YouTube) work. Of course, then you have to figure out how to identify similar users, which is another hard problem.

I read somewhere about a recommender system for movies (I think) and what they did is force a user to rate 5 random movies as part of the registration process. The movies weren't entirely random, but ones they thought were significant in identifying a user's tastes.

In your example of browsing patterns, maybe you could ask new users if they do or do not like to read certain types of articles. ie: are you interested in technology, sports, entertainment, random pictures of cats etc and seed their profiles based on their expressed level of interest for those things (maybe including dislikes from people who claimed to have similar interests).

But I would think that dislikes are not so important in the beginning. Although I don't know how your algorithm works, if you have a rough idea of what a person likes, shouldn't you be able to recommend things that they might like just based on that? When you end up recommending something that they don't like, you'll get some dislike data and can start factoring that in.

My startup faces a variant of this problem. Not exactly a recommender system, but it will get more accurate as time goes on. I am compensating by putting an initial value that is a guess, and then it will adjust as time goes on. Sort of messy but necessary.

It sounds like you're saying that in your case, the value will be different for each person, so you don't have a way of seeding it correctly for different people. I sort of think you have to have SOME info to go off of. Sort of like hunch asks you questions, maybe you could do something like that?

well so here is what I am doing instead... I get only a notion of the likes of the user, and since "everything else" is too huge a domain to consider as dislike, I try and rank the user's like concepts and pick the lower one's as the one's they like lesser. This only for initialization, and then I leave the filter to tune itself through feedback. I know it's not the best approach in the world, but let me see how this shapes up. Thanks of course, and any other ideas still welcome!