Hacker News new | ask | show | jobs
by nathanmarz 787 days ago
The chronological timeline at Twitter fans out on write. This makes sense when you consider that the most important application metric is the latency to load the timeline. That latency is a lot lower when you only need one query on the materialized timeline rather than a ton of queries for everyone you follow.
1 comments

Good to learn, thanks. "Everyone you follow" isn't quite what I was saying, but I also don't use twitter so I'm doubly in the dark here.

I'm surprised that someone with, say, a Twitter problem and tens of millions of followers - the two seem to go hand-in-hand - drives tens of millions of writes every time they post. But there you go, learned something today.

I was tech lead on that subsystem for a little while in 2010. A lot of smart people thought about the hybrid approach, either by using the search index to drive the timeline or building a custom ring-buffer-based index of all tweets. Ultimately two systems are harder to maintain than one, custom indices are hard, and the low-complexity approach dominated a higher-performance approach.

Also, contrary to popular opinion, we didn't go down when Justin Bieber tweeted, but we did have elevated error rates when large quantities of Justin Bieber followings put pressure on the MySQL row lock of his following count. In retrospect, lock striping would have helped, but the migration would have been horrific.

Back in the days of the fail whale, Twitter would go down when Justin Bieber tweeted because their system couldn't handle the load.