Hacker News new | ask | show | jobs
by schulz 5722 days ago
The difference is that digg has network information (IE: Followers). Stack overflow doesn't.

Think through the ramifications of each of these page flows:

1. Give me this question and everybody who commented on it.

2. Give me every post that this user is following has made over the last n days.

That's why.

2 comments

I think this is exactly it. You can't compare a comment thread on Digg to a Question on SO.
That would increase the load on the database. Why does that increase the load on the webservers?
It depends how you approach the problem. I'm guessing here, from my own experience implementing social graph features, but here goes:

There are a couple of ways you can go about this, the first is the database: Join the network of people against the land of content and bring it back. This doesn't work (as an aside this is what people mean when they say web scale, it has nothing to do with web traffic, it's social graphs) your database will cry. All though not at first, in development it works fine, and you feel fine, and for a while you're ok, but you start growing....

Another way you can go about it is by denormalizing. In this world you store a pointer to each content item for each user. So anytime I do something all the people [following|watch|connected|friended] to me get a record indicating I did this. This works, but now you have lots of data (lots and lots of data!) spread all crazy around. You need some kind of system to push that data out to everybody. It's those last two that drive up your hardware usage, it's not necessarily web boxes, but it's boxes in the background broadcasting the events out to the world, and the datastores to hold it all. Depending on how your web code works you could also have a lot of overhead on the webservers putting all that stuff together.

My experience here comes from building the social features into toolbox.com. A good example is this page http://it.toolbox.com/people/george_krautzel/posts-connectio... That's all the posts from users connected to our CEO (all 750k of them). Getting that to return in near real time is super fun (and you can probably tell that I went down the DB join path before it all fell apart).