Hacker News new | ask | show | jobs
by steelframe 2033 days ago
> Cellularization is an approach we use to isolate the effects of failure within a service, and to keep the components of the service (in this case, the shard-map cache) operating within a previously tested and operated range. This had been under way for the front-end fleet in Kinesis, but unfortunately the work is significant and had not yet been completed.

Translation: The eng team knew that they had accumulated tech debt by cutting a corner here in order to meet one of Amazon's typical and insane "just get the feature out the door" timelines. Eng warned management about it, and management decided to take the risk and lean on on-call to pull heroics to just fix any issues as they come up. Most of the time yanking a team out of bed in the middle of the night works, so that's the modus operandi at Amazon. This time, the actual problem was more fundamental and wasn't effectively addressable with middle-of-the-night heroics.

Management rolled the "just page everyone and hope they can fix it" dice yet again, as they usually do, and this time they got snake eyes.

I guarantee you that the "cellularization" of the front-end fleet wasn't actually under way, but the teams were instead completely consumed with whatever the next typical and insane "just get the feature out the door" thing was at AWS. The eng team was never going to get around to cellularizing the front-end fleet because they were given no time or incentive to do so by management. During/after this incident, I wouldn't be surprised if management didn't yell at the eng team, "Wait, you KNEW this was a problem, and you're not done yet?!?" Without recognizing that THEY are the ones actually culpable for failing to prioritize payments on tech debt vs. "new shiny" feature work, which is typical of Amazon product development culture.

I've worked with enough former AWS engineers to know what goes on there, and there's a really good reason why anybody who CAN move on from AWS will happily walk away from their 3rd- and 4th-year stock vest schedules (when the majority of your promised amount of your sign-on RSUs actually starts to vest) to flee to a company that fosters a healthy product development and engineering culture.

(Not to mention that, this time, a whole bunch of peoples' Thanksgiving plans were preempted with the demand to get a full investation and post-mortem written up, including the public post, ASAP. Was that really necessary? Couldn't it have waited until next Wednesday or something?)

2 comments

Ow, this was traumatizing to relive.

Yes, this is exactly how product development works at many (if not most) places within Amazon for engineers. It can be this toxic.

Disclaimer: Amazon engineer

Hahaha, well this time Jessy got paged so yeah... the summary got priority over turkey.