Hacker News new | ask | show | jobs
by mabbo 1005 days ago
Many years ago when I was a junior dev at Amazon, there was a massive project internally to split up every internal system into regional versions with limited gateways allowing calls between regions. The reason? We had run out of internal IPv4 addresses.

The Principal PM in charge of the "regionalization" effort was asked in a Q&A "why didn't we just switch to IPv6?".

Her answer was something along the lines of "The number of internal networking devices we currently have that cannot support IPv6 is so large that to replace them we would have needed to buy nearly the entire world's yearly output of those devices, and then install them all."[0]

It's easy to presume malicious intent on the IPv4 front from Amazon, but with so many AWS systems being on the scale they are at, I find it easy to believe that replacing all of the old network hardware may just be a project too large to do on a short timescale.

[0] - At least, that's my memory of it. I'm sure that's not an entirely accurate quotation.

13 comments

Can you remember what year it was?

I’ve got a slight suspicion you were given some bullshit or at least a creative treatment of facts e.g. everything had IPv6 support but FUD-filled network engineers didn’t want to turn it on.

Most network devices I’ve encountered were dual-stack way before anyone I knew seemed to care about actually using IPv6 — I always assumed it was added for US government/military requirements.

From memory, the regionalization project ran from approx 2014 to 2015 or 2016.

There were also other reasons given, like the amount of internal software that used e.g. IPv4 addresses. Also, AWS likes to have 'lots of small things' instead of one big thing (regions, AZs, cells, two pizza teams, no (official) monorepo) so regionalization was part of that.

Another big reason for regionalization, other than IPv4 exhaustion was that AWS promises customers that AWS regions are completely seperate, but with one big giant network, it turns out there were all sorts of services making calls between regions that nobody had realized. I have a couple of funny examples, but that might make me too identifiable :)

My favorite region isolation oversight was when someone realized that the perl cron job that iterated over every border router globally and applied ACL updates 2-3x per day didn't pay attention to isolation at all, and could easily have just started blackholing the entire network one device at a time if someone configured a bad rule.

The mitigation was to sort routers by hostname which began with the regional airport codes (iad, pdx, etc.), and pause for 15 minutes each time the first three letters changed to give folks on-call time to react.

Oh wonderful. 15 minutes to get the page, put down my beer, get on my computer, sign in to everything, get 2-factored 3 times AND figure out exactly what’s happening and fix it.
Chop chop!
This really would not have been true for vendor network gear of the sort AWS had been buying for years by 2014. It's possible that their own switches or the weird fabric they have internally wouldn't have worked with v6, or there were Annapurna NIC ASIC issues, but their primary vendors all would have been fine.

I'm not saying there aren't v6 issues (for some vendors, resource exhaustion might have come into play) or bugs, but there's no way it's that massive a problem. There are huge and complex all v6 networks all over the planet that have more stringent requirements (by law) than AWS DCs.

Facebook started its transition to make everything* internally IPv6 slightly before then.

It was indeed a lot of work. But worth it.

* When I was there we still had a handful of weird things that couldn't be made IPv6. If you needed to access such things you could get a dual-stack dev server.

You're talking about snowfort, and while ip exhaustion was one reason, it's also an isolation/fault tolerance/security thing.
Indeed, blast radius is a real concern that a lot of folks who try and imitate aws have to learn about the hard way.
Tell me more about these "pizza teams".
The idea is internal teams should be no bigger than what can be fed by 2 pizzas.
But I don't like working alone :(
slam dunk.
Badum tsshhhh
It’s unfortunate when you have big eaters in your team, but I suppose you can just scale up your pizza.

Pepperoni.16xlarge

oh

so they don't own 2 pizzerias? :(

ssh’ing through bastions was such a pain! We used the JMX GUI to review some AMP details from time to time, and port forwarding through the bastions was frowned upon, but our workflow was broken, what were we to do?

IIRC, early on on that project the gateways would get overwhelmed at the volume of traffic they were handling between various VPCs and had to be rolled back several times early on.

Of all the transitions I dealt with at Amazon, snowfort may have been my least favorite (though the ACL/role migration was pretty frustrating as well).

Sure, everything supports IPv6 -- until you turn it on and rediscover the tickets that have been sitting at the bottom of the JIRA for the last decade.
As a matter of fact Ron Broersma who affiliated with Space and Naval Warfare Systems Command (SPAWAR) has a list of equipment that should be fully IPv6-only compliant including various management interfaces and more. The US Navy supposedly tests this in house in a IPv6-only network. 4 years later I imagine the situation only got better https://www.youtube.com/watch?v=9kQje5gSWw8

Also, AWS now have the majority of NICs and switches built in-house I imagine. The underlay network could be IPv6 or totally custom for what we know (but probably is IPv4).

Cool! I'm glad the military is pushing the internet forward, I guess some things never change :)

As for AWS, I tend to agree with the sibling post and your supposition about IPv4. Everything out of the Amazon organization is aggressively, err, "minimal."

It's their baby lol
I believe the issue wasn't of IPv6 support generally, but of issues with TCAM space and the increase in routing table size moving from v4 to v6. Overflowing TCAM would cause routing to hit the CPU which would immediately lead to outages.

Tables were relatively large internally because AWS was all in on clos networks at that point. And the devices used to build those clos networks were running Broadcom ASICs, not Cisco or other likely vendors.

Right, if you worked at Amazon and didn't have incentive, then, you didn't do it. It was part of your job to not do things which you were not incentivized to do.
Just change Amazon for any other company name and the sentence is still correct. People do they are paid to do.
Right?? How old of a device you would have to get to NOT have IPv6 support?

EDIT: But maybe bugs, IDK.

If Amazon is your customer, you fix the bugs; if you're Amazon using your in-house kit, you fix your own bugs whenever you want to. There are plenty of real reasons not to do IPv6, but they are virtually all politics and possibly operational ("we'd have to train our people, and we don't spend money on that"). The idea it was a vendor issue is a BS trope that's been around for at least a decade if not 2.
> FUD-filled network engineers

FUD sounds like a mean way to say unproven in production

I remember the regionalisation, that was "fun" to be on the sidelines for (I was in a newer service that was regionalised from the get-go). I don't remember who the PM was for that one, but I remember that being when I truly came to respect the value that a TPM can add.

You're right about the cost and need to replace network equipment being one of the strong reasons why they didn't. Amazon used its own in-house designed and built network gear for a variety of reasons (IIRC there's a re:invent talk about it), which I'm sure is probably still the case. Every single one of those machines had fixed memory capacity and would need to be replaced to bump up the memory sufficiently large enough to handle IPv6 routing table needs etc. What they had wouldn't even be enough if they'd have chosen to go IPv6 Only (which you couldn't get through except via dual stack IPv4/IPv6 anyway).

Were they also by chance considered accelerators for encrypted traffic?

I'm not privy to details, but I recall once when a mandate was issued to a Java platform to remove an outdated encryption protocol (mandated by Amazon Infosec). The change was made and rolled out with little fanfare.

A few weeks later, a large outage of Amazon Video (which used said platform) occurred on a Friday evening. Root cause? The network hardware accelerators were only setup to use that outdated protocol, which in turn meant that encryption was happening in software instead. Under load, the video hosting eventually caved.

Might be specific to the hardware used for Amazon retail, but it reinforces the point of their home grown (and now aging) stack.

Maybe not the same story, but there was a sidecar service for encrypting traffic and doing access control and other things in a way that was transparent to the app (like Envoy, but without the mesh and much earlier). The original version was written by (maybe) a single engineer in Erlang. Version two was given to another team and rewritten in Java because. They had never tested at scale and every team I know who went to production with it fell over. There was some company wide deadline, but it was unusable, at the point, and the teams I was working with were gun shy to try it again since it was obvious that the owning team had know idea what the performance characteristics or system requirements were for it.

I think I switched teams before that was resolved and moved to some greenfield work where we didn’t have to worry about scale for a while, but I do believe they eventually figure it out.

I believe the PM was Laura Grit, who was actually a TPM I believe. Laura is a Distinguished Engineer now. She seems to constantly do massive scale projects. IPv4 being a smaller one now. Sadly I can't share some of the big projects she's doing now. I've gotten some sage advice from her on a few occasions that she had time and appreciate it.
> the PM was Laura Grit

Talk about nominative determinism...

Imagine never being able to be lazy about anything because the jokes are such a layup.
Yep, she was behind regionalization and IPv6 and such. I recall reading the same the the parent comment talks about.
> replacing all of the old network hardware may just be a project too large to do on a short timescale.

If that is the case, then Amazon should hold off on charging for IPv4 on a short timescale until they have replaced all the old hardware and can support IPv6 internally everywhere.

True. But if they are having a problem getting that done, adding a surcharge is a good way to get bottom-up pressure on AWS teams to finish the job.
this doesn't forgo v6 phase-in though, can't kick that can down the road forever.

surely they started the process...

right? i cannot imagine AWS just sticking head in the ground and ignoring this...

No one is ignoring it, and the US Government has done everyone another favour on this score. Years ago in the late Bush / early Obama administration, NIST required that all federal government agencies have IPv6 at the border. Federal government money is not to be sniffed at, and that had the effect of forcing a number of vendors to add IPv6 support. A few years after that, it became that the federal agencies needed to have dual-stack IPv4/IPv6.

About 18 months ago, the requirement came that federal agencies are required to be IPv6 Only, dropping the dual stack. IIRC they have until 2025 to do that. This has the neat effect of forcing all vendors to make IPv6 a first class citizen. The extra little fun from this is that it applies to the military JWCC contract that all the major clouds have been trying to land. The timescales of JWCC meant that initial offerings are pretty bare, but that won't be allowed to last.

Yep.

I work a federal entity tied to DoE and that's the biggest workstream cut out for us. 90% of our environment is either dual stacked or IPv6 native. We would love to kick IPv4 out under us and go full IPv6. Problem is that the vendors who are largely private don't have the same mandate so there's varying degree of "we support IPv6" which makes planning bit more difficult (especially at the discovery stage).

>Problem is that the vendors who are largely private don't have the same mandate

They get to decide how much that sweet federal $$$$ is worth to them. For most vendors, it's hopefully worth too much to ignore.

Yes they are working on it. A number of services already support v6, more to come.
1 is a number.

0 is also a number.

I can believe that, but also, places like google and facebook saw the problem of having >1million devices and the lack of IP addresses and moved to ipv6.
Hanlon's razor applies here.

There is no reason any company of any size should run out of IPv4 addresses internally, IF they are doing proper IP management. If I were to wager a guess I'd say there was a lot of waste going on, issuing /24s or larger to teams when all they need are /29s etc. It adds up over time. Once they exhaust private IP space they can always buy more at auction. They are Amazon after all, there's no shortage of money. This is just mismanagement of resources.

Comcast has 29.6 million Internet subscribers: https://expandedramblings.com/index.php/comcast-statistics/

If you wanted to assign a single non-routable IP in the 10/8 space to each of those cable modems, they would be 13 million IPs short.

Can you elaborate on proper IP management? Isn't that sort of what the parent post is talking about with splitting the network into regional chunks?

I'd imagine few service teams at Amazon would get very far with a /29, let alone a /24, if they have to put all their stuff on that.

My one issue with this is if it’s such a large lift, why burn the effort to just kick the can down the road? IPv6 has to happen at some point (and for AWS that point is sooner than most).

The better reason is the regionalization was probably a way to decrease blast radius in case of a service failure.

Also, AWS definitely did not regionalize all their services in 2016. IAM and certainly not DNS/Rte53 (part of the reason why they had their massive failure in US East 1 2-3 years ago)

I upgraded a P2P networking library recently to add support for IPv6. That was a pure software solution and it required a lot of work. When you have to upgrade hardware as well, I can imagine it would present a massive challenge (especially logistically). You'd have to upgrade ALL the hardware before you even start thinking about the software side of the equation.
So basically, their IPv4 infrastructure investment is so entrenched that they're trapped.

Sounds like a perfect opportunity for a market upstart to start out v6-only...

Out of IP addresses? Just use NAT.
32-bit IPv4 addresses are wasteful. By leveraging NAT, we can get away with a 1-bit addressing scheme and save 31 bits per packet!
Out of NAT sockets? Just use more IP addresses.
Hah, I worked on the hardware loadbalancer team during that period. Fun times.
Even cheap consumer hardware supports ipv6. There are significant financial incentives to continue the capitalism of ipv4 addresses. Like NFT's - an artificially limited capital. To create more addresses means more competition, loss of capital. Therefore they will spend billions on continually reworking internal IPV4 than going for the proper solution.
You obviously have never been on the backend of a big enterprise deployment.

The world is bigger than your apartment.

I worked in a company where we had network equipment all over the world.

Often IPv6 and IPv4 paths were entirely different and latency on IPv6 was much bigger, so we had to measure latency between nodes on both. Also, sometimes IPv4 was a symmetrical, but IPv6 wasn't. As a result, we had to buy tons of IPv4 addresses.

Our control plane was on IPv6, but data-plane had to be on both.