|
|
|
|
|
by remram
1438 days ago
|
|
So three failures: - The load balancer lost its connection to etcd and did not reconnect - The load balancer had no healthy backend and did not un-advertise itself - The load balancer did not report either of those issues to monitoring Honestly this is a little concerning. Are they using their own load-balancing software? If yes, why? |
|
To your final question: yes, we using our own load balancing software. We are building a global hosting platform that needs to be able to run on bare metal servers, not an end user application where load balancing is an afterthought. As such we can not use much of the software that a "regular" SaaS application may be able to. Some constraints our system needs to be able to solve:
- Our load balancers handle routing to 100s of thousands of unique deployments (services), all of which need to be accessible and routeable within milliseconds of a request coming in.
- We need to terminate TLS connections for thousands of unique domains.
- We need to be able to carefully control TLS handshakes, to be able to prewarm downstream services for an imminent request for a given deployment based on a TLS client hello SNI, before even having received an HTTP request yet.
- The system needs to handle hundreds of millions of hourly requests.
- The system needs to be able to run on bare metal.
- We currently handle 34 regions globally (up from 28 at the start of the year), which means that all of the data needed to fulfill the above requirements needs to be accessible from all of our PoPs in a matter of milliseconds.
For many companies global load balancing is something they can outsource to AWS, GCP, or Cloudflare. For us, this is core "business logic" that we need to have full control over. It's difficult for us to outsource, and it's questionable if it would be wise for us to do so. Building new systems is obviously always a complex undertaking, and there will be some stumbling stones in the way, but they can be overcome. We are still bullish that our path is the right one, even if we still have a lot of work ahead.
(if this seems interesting, and you want to work with us on building load balancers, among other things: https://deno.com/jobs)