Hacker News new | ask | show | jobs
by tzs 5613 days ago
Handling on average something like 76000 status updates per second (much higher when something interesting happens in the world), each of which must quickly become available to anywhere from 1 to a million clients.

Wordpress.com is trivial compared to Twitter.

4 comments

You're an order of magnitude past their peak load - New Years Eve they set a new record of almost 7000/second.

http://thenextweb.com/twitter/2011/01/06/new-years-eve-set-a...

I don't believe Twitter is trivial, but I think they're perceived as more complex than they are - WordPress + Reddit + Heroku only have like 10% that number combined. Apples to oranges, but those 3 companies would be doing more everythings per second than Twitter when you combine them.

As has been pointed out elsewhere on the thread, complexity doesn't scale linearly. It's far easier to write ten sites that each gets 30 qps than it is to write one site that gets 300 qps, and it's easier to write ten sites that get 300 qps than one site that gets 3000 qps.

Twitter's fundamental problem also is a harder one to scale than something like Heroku or Wordpress. For those hosted sites, you can shard easily by host, so that each of the 100,000 Heroku-hosted sites can get its own EC2 instance(s) and behave pretty much independently. You can't do that when the point of your site is that any action might instantly be broadcast to thousands of followers. High-fanout writes are not an easy problem to solve.

But if you design it well enough, it does. Here's a copy-pasta from a reddit comment of mine from a few months ago: http://www.reddit.com/r/programming/comments/b2u6t/twitter_o...

------------------

Color me unimpressed.

At some point, I was collecting 40GB/day of financial data (and that's after bzip2ing them .. probably 200GB/day before); This was done on hardware costing $30K (which was two equivalent machines with 4GB, each having 20*1TB in a raid configuration -- this was a hot-backup configuration) and the operation run (coded, supervised, administered) by 2 people.

I'm extrapolating from your numbers: Let's say you have 70GB over 14 days = 5GB/day; Let's assume Twitter has 100GB/day of text twits (which, incidentally, means ~1 billion tweets per day which I highly doubt, as they took few years to get to the 1 billion mark, and last I heard they were at less than 100 million twits/day)

Then, at this day and age (numbers selected for 2 years ago, when they had their last infrastructure revision), what you do is buy 20 servers with 8GB of memory each (for, say $5K each), plus a little redundancy, and store all the latest twits in memory, and the most popular user's older twits as well; everything else on disk. Throw in cheap web front-ends that don't even need a local disk, load balancing, and a gigabit ethernet backplane. You're still under $200K in equipment.

Yes, the code is not going to be trivial, but for $100K and 3 months you can get a stunt programmer (I know a few who can do it and won't charge as much).

A run-of-the-mill RDBMS is the wrong tool for this job; Basically, run of the mill anything; but that does not make it incredibly hard.

I think $300K for hardware and software, you can get a Twitter clone that performs as well.

Twitter is successful, but that's not thanks to good engineering.

Be careful extrapolating success scaling in one domain to success scaling in another.

I've also done the 40 GB/day of NYSE TAQ data financial analysis thing, and the 1000+ trades/second real-time financial analytics thing. And I work on Google Search, and have a passing familiarity with how other Google products scale.

The scaling challenges of batch financial models vs. real-time financial processing vs. information retrieval vs. email vs. social products are very different. Even going from a model of the web where it's static and changes every few months (like Google of 2004) to one where sites get update every few minutes and users expect to see the updates immediately in search results (like Google of today) requires vastly different technology.

The main thing about scaling that I've learned from working at a couple places that require it is to go into it with a fresh mind each time, and really pay attention to what the requirements are and what you can cut corners on. There're some general principles you should know (eg. Jeff Dean's "Numbers you should know", memory is much faster than disk, cut out layers of abstraction that you don't need), but in order to apply them effectively, you really need to pay attention to the details of your problem domain.

If you think you can solve Twitter's scaling problems, they're hiring, they're pre-IPO, and they're probably giving out decent chunks of stock.

> Be careful extrapolating success scaling in one domain to success scaling in another.

I agree about that.

> go into it with a fresh mind each time, and really pay attention to what the requirements are and what you can cut corners on. There are some general principles you should know: memory is much faster than disk, cut out layers of abstraction that you don't need, etc - but in order to apply them effectively, you really need to pay attention to the details of your problem domain.

(slightly edited) - This is golden.

However:

> If you think you can solve Twitter's scaling problems, they're hiring, they're pre-IPO, and they're probably giving out decent chunks of stock.

I know I can solve Twitter's scaling problems (I don't think the solution I posted is the end-all-be-all, and for all I know that might not be where their scale problem is -- it is just perceived and argued about this part, which is not very hard).

However, Twitter's abysmal uptime (for the kind of sevice they are providing) had no bearing on their growth in 2008-2009. And even if by re-architecting Twitter they can save $2M/year on operations, it would be dumb to do that before they're in the black for a while and can identify their real profit and loss centers.

Also, those stock are not worth quite as much as people think when you take everything into account. (I've got a successful exit as a non-founder behind me; I'm intimately familiar with all the gory details including taxes, dilution, etc -- Whether options or RSUs, if you are granted anything of value, you have to pay full taxes AT THE TIME OF THE GRANT).

My point was only to show how non-impressive the problem twitter is (supposedly) facing. It's a repeating discussion:

  - Twitter sucks
  - No they don't, they do xxx and it's damn hard
Don't know why I even bother anymore. A company that had (maybe still has?) their millions-of-views-a-day pages created in Ruby doesn't care about solving scale issues.

At Google, you guys throw out closing paragraph tags from the main page when it is clear it renders fine without them.

Yeah, the number I had was for total API calls (6 billion a day), not just updates. http://techcrunch.com/2010/09/17/twitter-seeing-6-billion-ap...
Handling updates to 7000 user accounts per second isn't what makes it hard for twitter. It is the propagation of these updates to a gazillion followers that taxes the system. Their primary overhead is the messaging update to followers. Ashton Kutcher has 6 million followers. Sending instantaneous notifications to all those million twitter clients is what makes it a killer.

TLDR; It is not the vertices in the twitter graph but the number of edges in it that is unprecedented.

Unprecedented, except is it really unprecedented or are we just looking at it from the sexy social startup whatever must say graph and also disrupt angle, where everything is new and super hard and nobody has problems that have ever been solved before.

RSS strikes me as an almost identical process other than the time subscribers wait to check for new content, and there are feeds with millions of subscribers. So now the problem is reduced to doing it in a timely fashion.

This is why I wonder if they're really that immensely complicated - isn't Google doing much the same thing, and possibly even at a bigger scale at one point, with their reader and FeedBurner?

"isn't Google doing much the same thing, and possibly even at a bigger scale at one point, with their reader and FeedBurner?"

Another company with a ton of employees.

Probably not that many on the FeedBurner team itself, but a lot of people at Google working on the general problem of keeping things unreasonably responsive at massive scale.

This is exactly the reason why every IM network has a limit to the number of contacts that you can have. I recall reading several years ago that this was an incredibly hard problem for IM providers to solve, and AFAIK none of them have allow you to have an infinite number of contacts.
Yes but that's handled by hardware, not by people, and I am sure only a few critical coders handle the software performace.

WordPress.com is not as trivial as you might think http://en.wordpress.com/stats/

Certainly it's just a matter of scaling once you hit a certain level of volume, you just have to be able to bring more servers online into the grid.

Scaling from 10,000 users to 1 million is probably very hard.

Scaling from 1 million to 100 million, well you better have a pattern down that works with easy hardware replication (like google does).

Understand that Google is continuously rewriting their infrastructure to handle increased scale. That's what they have 25k employees for.

Jeff Dean's rule of thumb is that you should build a growth factor of 10 into the design, but any more than that and you will probably have to re-architect anyway. So going from 10,000 to 1 million and 1 million to 100 million are probably roughly equivalent in difficulty.

How many of those 25k employees actually handle any of the infrastructure scaling?
A fairly large percentage of them, and many of the ones whose direct job responsibility isn't infrastructure (like me) frequently have to deal with the consequences of building for scale as they develop features.
The big difference between blog hosting and things like Twitter (or Reddit, or Digg, etc) is that blogs are independent, so adding servers scales you up linearly. When you are looking at X's blog, basically everything you see is coming from one server.

You will have to have something that deals with mapping URLs in the unified logical name space of your site to the individual servers that the particular blogs on--that's the part that you can't just throw machines at and get good results.

With the social sites, you can't isolate things easily, because what a given person sees at any time is drawn from an ever changing set of content from other users, with each viewer drawing from a different set.

Based on the stats on the site, I calculate that Wordpress.com is averaging about 10 writes and around 8800 reads per second.

edit: Correction, ~880 reads per second.

Looking at the posts per day detail I think it's more like 30 per second with bursts that probably go to twice that, while nighttime is idle depending on timezones.

But I agree it's a fraction of 7k/sec peak for twitter.

However, twitter does not have to parse html, has no plugins to execute or templates, and has a max string length of 140 characters which is trivial.

Each post/comment published on wp.com takes many, many more cpu cycles than twitter.

I agree that Wordpress requires far more CPU time (I run a network of Wordpress blogs).

My figures: 900,000 transactions (500,000 posts + 400,000 comments) / 86,400 seconds per day:

~= 10 transactions per second on average, though like twitter, activity probably has a power law distribution corresponding with US daylight hours.

Reads: 2.3 billion / month Suppose a 30-day month, that's 86,400*30 or 2,592,000 seconds per month.

2,300,000,000 / 2,592,000 ~= 887 pageviews per second.

I was incorrect by an order of magnitude, though the same uneven traffic patterns caveat applies.

Just to put that into perspective, right now, you can buy off the shelf from Tibco an appliance for routing 10 million messages/sec (at an average message size of 50 bytes).

What Twitter does is not hard, and they do it badly.

Having used Tibco and having written high-performance message routing services for investment banks, I'd like to say yes it is hard, and Tibco does it badly.
I'm not sure where you got your numbers, but based on it, there are 6.5 billion new status everyday. I don't think it's quite accurate.