|
|
|
|
|
by nostrademons
5613 days ago
|
|
As has been pointed out elsewhere on the thread, complexity doesn't scale linearly. It's far easier to write ten sites that each gets 30 qps than it is to write one site that gets 300 qps, and it's easier to write ten sites that get 300 qps than one site that gets 3000 qps. Twitter's fundamental problem also is a harder one to scale than something like Heroku or Wordpress. For those hosted sites, you can shard easily by host, so that each of the 100,000 Heroku-hosted sites can get its own EC2 instance(s) and behave pretty much independently. You can't do that when the point of your site is that any action might instantly be broadcast to thousands of followers. High-fanout writes are not an easy problem to solve. |
|
------------------
Color me unimpressed.
At some point, I was collecting 40GB/day of financial data (and that's after bzip2ing them .. probably 200GB/day before); This was done on hardware costing $30K (which was two equivalent machines with 4GB, each having 20*1TB in a raid configuration -- this was a hot-backup configuration) and the operation run (coded, supervised, administered) by 2 people.
I'm extrapolating from your numbers: Let's say you have 70GB over 14 days = 5GB/day; Let's assume Twitter has 100GB/day of text twits (which, incidentally, means ~1 billion tweets per day which I highly doubt, as they took few years to get to the 1 billion mark, and last I heard they were at less than 100 million twits/day)
Then, at this day and age (numbers selected for 2 years ago, when they had their last infrastructure revision), what you do is buy 20 servers with 8GB of memory each (for, say $5K each), plus a little redundancy, and store all the latest twits in memory, and the most popular user's older twits as well; everything else on disk. Throw in cheap web front-ends that don't even need a local disk, load balancing, and a gigabit ethernet backplane. You're still under $200K in equipment.
Yes, the code is not going to be trivial, but for $100K and 3 months you can get a stunt programmer (I know a few who can do it and won't charge as much).
A run-of-the-mill RDBMS is the wrong tool for this job; Basically, run of the mill anything; but that does not make it incredibly hard.
I think $300K for hardware and software, you can get a Twitter clone that performs as well.
Twitter is successful, but that's not thanks to good engineering.