|
|
|
|
|
by vidarh
579 days ago
|
|
> Every single user gets an updated live feed of tweets drawn from every other user -- handling millions of users simultaneously is not easy. This is a trivial approach, which works but is suboptimal (you can cut down on the IO with various optimisations): Shard by id. Treat it as messages queues. Think e-mail, with a lookup from public id -> internal id @ shard. Then, additionally, every account that gets more than n followers are sharded into "sub-accounts", where posts to the main account are transparently "reposted" to the sub-accounts, just like simple mailing list reflector. (the first obvious optimization to this is to drop propagation of posts from accounts that will hit a large proportion of users, and weave those into timelines on read/generation instead of writing them to each user; second is to drop propagation of posts to accounts that have not been accessed for a while, and instead take the expensive generation step of pulling posts to create the timeline next time they log in; there are many more, but even with the naive approach outlined above this is a solved problem) |
|