Hacker News new | ask | show | jobs
by burningout 5089 days ago
Again, as the parent poster also posted, I think you have never worked on large data. Twitter is like a big mailbox, only that every mail only has 160 bytes. This has been solved 10 years ago.
3 comments

If you don't understand that the request distribution matters more than payload size, you aren't even seeing the problems.

I encourage you to analyze infrastructure for a twitter style app using inbox duplication. Once you model this against hardware costs you'll learn something about how utterly expensive write amplification is in a hot data set that must be backed by ram due to availability requirements.

Wait, see my comment below. Twitter received 15B (yes, B) API calls/day last July. How does that compare to your typical email client?

I don't want to argue that Twitter is astoundingly hard, but serving ~170K requests/sec can't really be that trivial, even if they're 160 bytes (they're not, since Twitter sends metadata, logs those messages, tracks service metrics, etc. for those messages)

If you treat twitter like a big mailbox, things will work "ok". It's not the worst approach ever, that's for certain. But end-user perceptible performance would be a fraction of what twitter has today.

P.S. How many images does twitter serve up per day at present? That's a tad more than 160 characters of data.

Another key difference is that email users generally contribute directly to their provider's infrastructure costs in providing email as a service. Email infrastructure (and the user experience) is fragmented, and global funding generally scales with global load.

Instead twitter must monetize via advertising of some form, and so the percentage of folks who do not respond to ads acts as a really strong factor in your cost calculations. In this sense, email software has it easy, and can be extremely wasteful in the resources it consumes.

It's not just that the availability expectations of twitter are higher than email, it's also that the economic base of the infrastructure is far more sparse.

And yet another key difference is that there are few email server installations that support half a billion users. Saying that the scaling problem is "solved" because all you have to do is copy, say, gmail, is kind of silly.