Hacker News new | ask | show | jobs
by rorrr 5400 days ago

    What reddit really needs is a ranking engine
    recommending users, submissions and comments 
    to you based on the users you've friended and 
    the comments and submissions you've upvoted.
Do you realize the insane amount of calculations that would take?

For each pageview you would have to find all stories in that subreddit, find all the authors, find all stories you ever voted on, find all the users you are friends with, and somehow rank all this shit.

It will cost millions of dollars a month in CPU time just to run something on the scale of reddit.

2 comments

Sounds like a challenge :)

First, this doesn't have to be for each pageview. Crudely, you want every story to have a vector in some giant hyperspace characterizing it, and every user to have a vector precomputed offline based on their previously shown affinities, and take dot products.

This is not necessarily easy, or computationally very cheap, but the payoff can be pretty big.

I believe LinkedIn runs something similar to this every night.

I'm not sure what you mean exactly, but hey, if you can pull it off, it might be successful.

Just think of the number of comments up/down voted every millisecond, and that's just one the dimensions. The scale of that matrix would be enormous.

how is it much different from the ranking engines netflix and amazon use?
In those cases, users purely receive recommendations for media/products. In this case, users receive recommendations for other users to follow as well.
You can calculate the "weight" of the movie for a given user once. The number of movies is very low, compared to the number of stories on reddit.

Whatever that guy is proposing is an insanely complex dynamic system.

I'm not saying it's not doable, I'm just saying it's very expensive.

It might be less complicated then you'd expect (at least one way of doing it): http://en.wikipedia.org/wiki/Collaborative_filtering

The core idea is linear regression.

Here's Google's service: http://code.google.com/apis/predict/. Here's a commercial API service: http://www.directededge.com/.