Hacker News new | ask | show | jobs
by bgentry 4582 days ago
Have you worked extensively with ELBs? They need pre-warming in advance of significant traffic changes, they can't handle spikes well because they take many minutes to respond to changes in traffic volume, and they use an ever-changing set of IP addresses for your load balancer nodes.

The GCE load balancer has none of these problems, which makes it a huge advantage over AWS and ELBs.

Disclaimer: I'm an engineer at Heroku. We manage dozens of ELBs for ourselves, and thousands of them for our customers.

2 comments

>They need pre-warming in advance of significant traffic changes

Why?

Because they consist of a set of EC2 instances, the vertical and horizontal scale of which is determined automatically based on the average traffic profile of each node. Once the traffic has increased enough to warrant a scaling event, it takes minutes for new ELB nodes to come online and go into DNS rotation before they can start serving traffic.
It's obnoxious that AWS hasn't developed an API call or web interface option for the ELBs to pre-warm them yourself, vs having to contact AWS support to get the pre-warming done "manually".
Out of interest, what is a rough idea of the total requests per second on heroku for everything? All your nodes or whatever you call them? Dynamos?

Is 1,000,000 request per second a stupid, pointless number as no-one ever gets 1,000,000 requests per second or is this some sort of meaningful number?

I can't release any of our traffic numbers, but I can tell you that 1M req/s is an enormous number. For some context, here's a post from Netflix from the end of 2011 where they state that their API received 20,000 req/s at peak: http://techblog.netflix.com/2011/12/making-netflix-api-more-...
Yes, but! The impression I get from this 1 million req/s test is that no actual logic is happening on the backend. E.g. no database queries, no business logic, etc - basically a noop call.

As we saw when running the techempower benchmarks, simply going from the plaintext test to the single database query dropped the best performer from ~600,000 req/s to ~100,000 req/s. Throw in a bit more business logic, another query, and a slightly heavier response, and it is easy to imagine that 1 million req/s now sitting much nearer to 20,000 req/s.

My point being that, that 1 million req/s is a very optimistic number when used in such a comparison. Is it still an impressive max throughput? Yes. I just don't want anyone to think that they can now, say, host 50 netflixes on this setup.

Note: I realize you probably weren't meaning to directly compare those two numbers, but it somewhat read that way. I definitely do appreciate the context though - quite interesting to know that the netflix API was peaking at ~20,000 req/s in 2011.

This is not the point of the test, the test is about showing you that the load balancer in GCE can handle that many requests per second and with a single IP address. Whatever the machines are doing behind doesn't matter since the load balancer job is to handle a ton of traffic. This is practically the only case in which responding with 1 byte makes sense in the test.
I completely get that. I responded to the parent because he introduced the 20,000 req/s number as a comparison point.

The Google test is both a theoretical max throughput (that one wouldn't reach under basically any normal use case) and a test of the load balancer capabilities. The Netflix 20,000 req/s number is, instead, a real use case example.

My point was that one shouldn't directly compare those numbers and say, for example, that this GCE setup has 50x better throughput than Netflix.

I imagine that if Netflix were to stub all of their API calls with noops that returned 1 byte responses, they would be able to handle significantly more than 20,000 req/s. Basically, I don't think we actually disagree here.