Hacker News new | ask | show | jobs
by user5994461 2156 days ago
Well, a single server doesn't really need to do more than 10Gbps or 100k connections. Going above is a "simple" matter of managing horizontal scaling.

What I wonder about is how do you distribute the traffic on the higher level? I imagine there are separate clusters of envoys to serve different configurations/applications/locations? How many datacenters does dropbox have?

I was running a comparable setup in a large company, all based on HAProxy, there was a significant amount of complexity in routing requests to applications that might ultimately be in any of 30 datacenters.

1 comments

We had a large rundown of our Traffic Infrastructure some time ago[1]. TL;DR is:

* First level of loadbalancing is DNS[2]. here we try to map user to a closest PoP based on metrics from our clients.

* User to a PoP path after that mostly depends on our BGP peering with other ISPs (we have an open peering policy[3], please peer with us!)

* Within the PoP we use BGP ECMP and a set of L4 loadbalancers (previously IPVS, now Katran[4]) that encapsulate traffic and DSR it to L7 balancers (previously nginx, now mostly Envoy.)

Overall, we have ~25 PoPs and 4 datacenters.

[1] https://dropbox.tech/infrastructure/dropbox-traffic-infrastr... [2] https://dropbox.tech/infrastructure/intelligent-dns-based-lo...

[3] https://www.dropbox.com/peering [4] https://github.com/facebookincubator/katran

Katran - nice! Any issues with it at all? Do you use it with xdp capable hardware or just normal driver offload?
It works beautifully. We use driver offload (i40e on the Edge.)
Cool to see someone using Katran in production. Really interesting stack you have there.
Actually, all the props for that go to Katran's author himeself. When we hired Nikita V. Shirokov (tehnerd), the first thing he did was replacing IPVS with XDP/eBPF-based Katran, which improved our Edge servers throughput by 10x, from ~2MPps to ~20Mpps.

He also contributed a lot to Envoy migration migrating our desktop client to it and adding perf-related thing like TLS session tickets' lifetime to SDS.

Great. Exactly what I was looking for =)
@SaveTheRbtz

"we have an open peering policy"

That's a bit of a lie given you have a minimum 50Mbps requirement before you even consider a peering request.

I would call that Selective, not Open !