| 6 Billion Clicks is not the hard part, Storing analytics for those clicks is. Though this is a pittance compared to what Google, or Adobe serve with their analytics. I don't know enough about how "realtime" Bitly is. Handling 6 Billion writes a month of "raw" data from a user as a serialized string would not be that hard. 2300 clicks and there for analytic writes per second average for the month is likely based on the 80/20 rule about 10k Clicks per second peak. Now, if we assume that you write a Serialized Write immediately with all the data from a users, and then do a chasing analysis, so that you don't have to do all the work at Peak Price... and that each user is then 20 writes. We end up with 10k peak Serial, and 5x that in chasing writes. So we need 60k writes per second. On DynamoDB that would cost $90k upfront and $7.71 an hour. ($99,500 a month) That is "a lot" but it isn't huge. Doing this on Google AppEngine would likely be about the same since you pay fixed fee per write, not based on your throughput. Depending on the amount of indexing you would pay $1.80 - $2.40 per 100k clicks based on the above math so $108k - $144K per month. I am not familiar enough with Azure to quote a price. I know there would be other costs. This is just the database portion. But as I expect this to be the majority of the price, I thought it was the part most worth discussing if you were building a Bitly on a Cloud Platform. |
With chasing writes you do it in slow periods between traffic bursts since you're basically just pulling them off a queue to process so you don't need to count that in with your peak burst numbers.
Your costs seem really high too. The above system was ~$10K/month on GoGrid including 2 DB servers that were on dedicated servers(not really impressive ones either I think they were ~$500/month each), a load balancer on a dedicated server, 2 dozen webservers or so, a few support servers(admin panels, client interfaces, puppet, etc), and a small hadoop cluster.
Redis would receive the raw data, the DB stored the rolled up data and the raw data/logs would be compressed and go onto a small hadoop cluster in case we needed to process it for a new type or report or look for something specific.