|
|
|
|
|
by StreamBright
3439 days ago
|
|
WhatsApp from the very beginning went with Erlang and it perfectly suits their needs. You can almost map 1:1 the messages in WhatsApp to messages in Erlang. On the top of that they optimized the hell out of their stack[1]. Twitter on the other hand is a very different problem where you need to broadcast messages in a 1:N fashion where N can be 100.000.000 (KATY PERRY @katyperry. Followers 95,366,810). On the top of that they need extensive analytics on the users so they can target them in the ad system. I am pretty sure there is some space for optimisation in their stack, not sure how much % of these servers could be saved. http://www.erlang-factory.com/upload/presentations/558/efsf2... |
|
In terms of the broadcast problem, it's trivially handled by splitting large follower lists into trees, and introducing message reflectors. Twitters message counts is high for a public IM system, but it's not that high overall messaging volume for private/internal message flows. More importantly, despite the issue of large follower counts, if breaking large accounts into trees of reflectors, it decomposes neatly, and federating large message flows like this is a well understood problem:
I've half-jokingly in the past you could replace a lot of Twitters core transmission of tweets with mail servers and off the shelf mailing-list reflectors, and some code to create mailboxes for accounts an reflectors to break up large follower lists (no, it wouldn't be efficient, but the point is distributing message transfers including reflecting messages to large lists is a well understood problem), and based on the mail volumes I've handled with off the shelf servers I'll confidently say that 100's of millions of messages a day that way is not all that hard to handle with relatively modest server counts.
Fast delivery of tweets using reflectors to extreme accounts would be the one thing that could drive the server number up high, but on the other hand, there are also plenty of far more efficient ways of handling it (e.g. extensive caching + pulling rather than pushing for the most extreme accounts)
Note, I'm not saying Twitter doesn't have a legitimate need or the servers they use - their web app does a lot of expensive history/timeline generation on top of the core message exchange for example. And the number of servers does not say much about their chosen tradeoffs in terms of server size/cost vs. number of servers, but the core message exchange should not be where the complexity is unless they're doing something very weird.
[1] Taking snapshots of their analytics and the API follower/following count shows they don't agree, and the analytics numbers changes after the fact on a regular basis.