| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ak217 1964 days ago

I don't think that's true. Slack seems to have their core online services split across a number of VPCs, and for some reason decided to use Transit Gateway to connect them. Transit Gateway is a special-purpose solution that is geared toward cross-region and on-prem to VPC connections in corporate networks, not to global high-traffic consumer products. It's the wrong tool for the job. Its architecture is antithetical to the other horizontally scalable AWS solutions. It introduces a single (up to) 50 gbps network hub that all inter-service traffic must go through. Native AWS architectures avoid such central hubs and provide a virtual routing fabric instead.

Slack could have chosen one of many other AWS design patterns such as VPC peering, transit VPC, IGW routing, or colocating more services in fewer VPCs (with more granular IAM role policies to separate operator privileges), to provide an automatically scaled network fabric to connect their services.

(This isn't to criticize Slack's engineering team. They have successfully scaled their service in a short time, and I'm happy with their product overall, and with their transparency in this report. But I think AWS has the world's biggest and most scalable network fabric - it's just a matter of knowing how to harness it.)