Hacker News new | ask | show | jobs
by timaelliott 5085 days ago
Yes.

In 2008/2009, another engineer and myself built an ad-platform that received around 500M impressions per day, 5M clicks per day. And it wasn't just recording a tweet or publishing out to followers. We took the user input query, had to do some keyword/relevancy targeting, geofiltering, matching to advertisers and deliver back a large result set of adverts. All within 100ms.

Our platform was also apache, mod_php, memcached, mysql and rabbitmq. So definitely not the most optimal of platforms by any means. We had two colos with ~20 servers (dell r410s) at each facility.

Twitter just recently announced 400M tweets/day. I'm not trying to brag about my experiences, because looking back now we made numerous amateur mistakes, but just showing that Twitter's "scale" is a joke compared to everyday challenges at any large internet ad network.

5 comments

You understand that 400M tweets a day is the number of tweets posted to their system, right? That speaks not at all to the consumption of those tweets, which is the metric you're using for your ad platform.

Additionally, they don't just deal with 160 characters, because again, somehow you're still talking about data being posted, and not data being consumed. Data is consumed off their site via polling APIs, streaming APIs, and a website, all of which are pushing those 400M tweets a day out to plenty of consumers.

They may not have as ridiculous a scale as they act like they do. But let's be clear: it is nowhere near as trivial as you make it out to be, either. Armchair quarterbacking is always easy, because you aren't exposed to the complexity that arises when you've spent a few months and years hitting the corner cases of the problem you're commenting on.

So you had 500m reads on a relatively static data set + 5m writes on an unrelated log? Sounds like a fun problem, but I agree I doesn't sound like rocket science. On the other hand, it also doesn't sound like Twitter, having 400m writes per day, and 400*x million reads on that very dynamic data set. Just seems that's a slightly harder problem.
Adserving is not really static. Cachebusters are named so for good reason. Nowadays ad server developers are clever enough to separate click tracking and impression tracking (the non-Enterprise version of OpenX still deserves a lot of ಠ_ಠ though).

In an RTB environment, there is an additional constraint of having to serve up your ad (or decision) within 60ms (Google ADX sets a hard limit of 80ms), and the fastest best bid wins.

I don't think that's a less hard problem compared to Twitter, especially at high volumes. You can't just say "scale sideward!".

That said, the first link was totally misleading. I was actually quite shocked to see that Twitter only had 42M uniques per month, because a typical ad network does a lot more

EDIT: ah.. 15B requests/day makes more sense. Wtf is with the wrong stats?

Are you talking about 15B vs. the visits chart I linked? If so, the 15B number comes from API calls, which do not have to happen through the website (think of all the Twitter clients).
Requests are requests.15B is a gigantic amount.
Right I agree 100%. I just couldn't tell if you were trying to reconcile the 15B with the 45M number from Compete.
See my edit. Your ad impressions reached approximately 3% of Twitter's daily request load last year. Note that those requests can serve up to 200 tweets + metadata.

This doesn't account for Twitter's budding ad service, which one can assume has some of the same functionality (targeted advertising, information retrieval) as traditional ad networks.

You are off by nearly 2 orders of magnitude from twitter's scale. They have billions of views per day and each of those views is a stream comprised of hundreds of different sub-streams.
Add to that, the challenges of sub-60ms RTB. All the fun!