Hacker News new | ask | show | jobs
by _kp6z 2801 days ago
A more plausible explanation is that the xen networking path is simply expensive, the intel VFs are limited by queue count and silicon (i40e isn't a great ASIC), and the Annapurna part is really an ARM64 NPU. NPUs have been abandoned by most silicon vendors and have a tragic history. It's simply hard to make NPUs work right at attractive price/power/performance and at high speed versus fixed function scatter/gather I/O units coupled with general purpose CPUs running software network stacks. The only benefit Annapurna gives EC2 over a software device model is a hard security boundary of effectively another computer inside the computer for Nitro metal as a service. I think this is one reason why EC2 is limited to 25G while 100G has been commodity for a long time.

Here is a demonstration of a software stack that can scale toward hardware limits without relying on a particular vendor https://www.slideshare.net/SeanChittenden/freebsd-vpc-introd.... This approaches 100G line rate for large packets which is what it was optimized for. I don't know PPS at low packet size but do know what would be required to optimize that use case and it could be done pretty quickly.

6 comments

I think this is one reason why EC2 is limited to 25G while 100G has been commodity for a long time.

Interestingly, the ENA driver has #defines for speeds up to 400 Gbps.

My guess as to why EC2 instances are limited to 25 Gbps is that it's a matter of balancing overprovisioning and the need to avoid having a single instance eat too much of a rack's bandwidth. I don't know how much bandwidth they have going to each rack, but there's a limit to how much it makes sense to provision; if typical bandwidth is on the order of 10 Gbps per rack (say, 80 instances pushing 125 Mbps on average) then you might want to provision 200 Gbps/rack and limit each instance to 25 Gbps rather than provisioning 1 Tbps/rack and limiting each instance to 100 Gbps.

(Numbers above are completely invented; I don't have any internal knowledge of how Amazon's networks or datacenters are set up.)

Most large operator datacenters are converging toward things like Clos and fat tree networks that provide abundant bandwidth at acceptable cost and with minimal blocking. Switch silicon vendors have really done yeoman's work pushing the envelope to make this possible and inexpensive. AWS might have such magnitude of machine count and generally low customer resource utilization that they can oversubscribe a lot, but it would be pretty silly to only bring in 200gbps to a rack post 2014 when the Broadcom Tomahawk switch ASIC became dominant.
James Hamilton talked about their commitment to 25GbE hardware at Reinvent in 2016. Fast forward to ~23m. https://youtu.be/AyOAjFNPAbA
I'm slightly confused as you are both talking about AWS Nitro and XEN. I know Nitro moved off of Xen and was roughly based on KVM.

Also, are you talking about Annapurna in it's pre-acquisition form or new one? AWS talks about new custom asics and multiple ARM SoCs on their Nitro system.

The comment is quite clear, there are three networking technologies in use at amazon. Nitro was never xen, Nitro is KVM with Annapurna add in cards.
Agreed - I read this and saw XPS being the culprit writ large.

AWS aren’t alone in this, and actually do pretty darn well compared to their competition - we had a nightmarish time a few years back with exactly this with a VPS provider - half of every second the traffic to the memcached cluster would just stop. Turned out they’d set hard limits on packets/sec to avoid oversaturating the host, so the advertised Gbps interconnect was actually 50Mbps when you saturated the packet scheduler.

Interesting theories on the EC2/Annapurna situation.

Do GCP, Azure, or any other cloud providers offer 100G networking?

Not to the instance, AFAIK. Google Cloud maxes out at ~20Gbps, and I think Azure does ~40Gbps.
It's really about flows as well, not necessarily total throughput.

AWS Nitro allows 5G/bit per flow. And then maxes out at 25G/bit. I know GCP does something similar.

Also, pretty sure that is false regarding Azure, they have a small availability of Infiniband, but, that is not on their general compute platform and has a narrow use case/many restrictions. Azure has had the worst networking performance from my experience and only had 10GbE NICs (it's been a while though)

Sounds like a marketing blurb but from just a few days ago:

"Azure is breaking the speed barrier in cloud connectivity. ExpressRoute Direct provides 100G connectivity for customers with extreme bandwidth needs. This is 10x faster than other clouds."

https://azure.microsoft.com/en-us/blog/azure-networking-fall...

Can sb confirm that? Have a useful case in mind.

I do not believe ExpressRoute is instance level, so not directly relevant to this discussion.
100G line rate with large packets is only 8 Mpps, that's only ~5G with 64 byte packets.
It's not so much the size of the packets as it is having flows that can be vectored through the packet processing stack in batches. This is obviously easier to ensure as a sender and a receiver than something like a bump in the wire deep packet inspector unless it doesn't keep stateful data.