| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by InclinedPlane 5088 days ago
	What does content size matter? The challenge is that every single page of content except for each individual tweet is utterly unique for every user. That defeats the vast majority of straightforward caching implementations. You can't cache fully rendered pages ever because the chance that one random timeline view at a given time will be identical to any other view (even by the same person at a different time) is pretty much as close to zero as possible. Every view is dynamically generated content from up to several hundred or thousand different streams of data and needs to be put in order and have all of the per-user metadata set correctly. Once you start looking into the actual mathematical constraints of the problem of twitter you realize that it's a scaling nightmare. Hundreds of millions of updates per day and tens of thousands of views per second (billions per day). There's only a few people in the world who have the right to look down on stats like that.

1 comments

burningout 5088 days ago

Again, as the parent poster also posted, I think you have never worked on large data. Twitter is like a big mailbox, only that every mail only has 160 bytes. This has been solved 10 years ago.

link

jasonwatkinspdx 5088 days ago

If you don't understand that the request distribution matters more than payload size, you aren't even seeing the problems.

I encourage you to analyze infrastructure for a twitter style app using inbox duplication. Once you model this against hardware costs you'll learn something about how utterly expensive write amplification is in a hot data set that must be backed by ram due to availability requirements.

link

achompas 5088 days ago

Wait, see my comment below. Twitter received 15B (yes, B) API calls/day last July. How does that compare to your typical email client?

I don't want to argue that Twitter is astoundingly hard, but serving ~170K requests/sec can't really be that trivial, even if they're 160 bytes (they're not, since Twitter sends metadata, logs those messages, tracks service metrics, etc. for those messages)

link

InclinedPlane 5088 days ago

If you treat twitter like a big mailbox, things will work "ok". It's not the worst approach ever, that's for certain. But end-user perceptible performance would be a fraction of what twitter has today.

P.S. How many images does twitter serve up per day at present? That's a tad more than 160 characters of data.

link

jasonwatkinspdx 5088 days ago

Another key difference is that email users generally contribute directly to their provider's infrastructure costs in providing email as a service. Email infrastructure (and the user experience) is fragmented, and global funding generally scales with global load.

Instead twitter must monetize via advertising of some form, and so the percentage of folks who do not respond to ads acts as a really strong factor in your cost calculations. In this sense, email software has it easy, and can be extremely wasteful in the resources it consumes.

It's not just that the availability expectations of twitter are higher than email, it's also that the economic base of the infrastructure is far more sparse.

link

InclinedPlane 5088 days ago

And yet another key difference is that there are few email server installations that support half a billion users. Saying that the scaling problem is "solved" because all you have to do is copy, say, gmail, is kind of silly.

link