Hacker News new | ask | show | jobs
by ohduran 925 days ago
Not to downplay the absolute behemoth of a task they manage to pull out successfully...but why not upgrading as new versions came along, with less fanfare?

It is a great read, but I can't shake the feeling that it's about a bunch of sailors that, instead of going around a huge storm, decided to go through it knowing fully well that it could end in tragedy.

Is the small upgrades out of the question in this case? As in "each small one costs us as much downtime as a big one, so we put it off for as long as we could" (they hint at that in the intro, but I might be reading too much into it).

3 comments

OP here - we would have used the same approach for the minor upgrades. This isn’t a case of “we procrastinated ourselves into a corner” and more a matter of “if it isn’t broke, don’t fix it” recognizing we would need to make the jump eventually.
Just for your information, minor upgrades on Aurora Postgres does now claim increased resilience across minor upgrades, there are some caveats despite the Zero Downtime naming: https://docs.aws.amazon.com/AmazonRDS/latest/AuroraUserGuide...

I've relied on this as the minor upgrade method since it was available and it has worked as advertised, with no perceivable issues. This may be traffic and operation dependent obviously but worth having a look at.

Worth saying we do the minor upgrades incrementally, intra-day and a few weeks to a month after they are available, as a matter of routine, with a well documented process. Overhead is minimal to practically zero.

Upgrading N versions is just as much as a threat to availability regardless if N is 1 or 3.
Each one incurs some downtime. If their real answer is less than 60 seconds, then they’d have incurred that multiple times on the road to 15.