Hacker News new | ask | show | jobs
by ghomem 638 days ago
Well that's exactly the point! Creating complex cloud resources with, for instance, Terraform, is less reproducible than a shell script on an LTS system like Ubuntu or RHEL - that's because the cloud provider interfaces drifts and from time to time stops accepting the terraform manifests that previously worked. And to fix it, you have to interrupt your normal work for yet another unplanned intervention in the terraform code - this happened to my teams several times.

This does not happen with Puppet + Linux, because LTS distributions have a long release cycle where compatibility is not broken.

I tried to explain this topic in the article linked above. Not sure how far I succeeded.

1 comments

Leaning into LTS is nice until you near EOL and have to migrate everything in an often Herculean effort to work with the next LTS release.
Like 12 years of life cycle is not enough for you to plan a transition?

You can use the entire life cycle but not one is forcing you to. You can update from one LTS to another every 2 years, or 4 years, or 5 years... you decide.

I don't really think we're in disagreement here. The longer you wait, the harder the transition will be. LTS is a good foundation, and usually the right choice for "enterprise" or "business" settings, but you should not rely overmuch on any one LTS release's way of doing things, when the wider Linux ecosystem moves much faster.
The longer you wait the harder the pain. The less you wait the more frequent the pain. So it depends on the function that converts intensity and frequency to suffering :p But, most importantly, the fact that LTS gives you a choice is what I was highlighting.

For the scope I operate, which is pretty standard Linux packages (PostgreSQL, MariaDB, Nginx, Docker, OpenVPN, OpenSSH) the changes between 16.04 and 22.04 have been quite OK to deal with.

It's a tradeoff. Doing a big effort once every 4 or 5 years, vs a hopefully smaller effort every year. Sometimes the intermediate smaller steps help you move forward, sometimes it just means more migrations. Sometimes the software/hardware you need means you can't use a LTS OS at all.

If possible, it's nicer to pick established, mature software for as much of your stack as you can, so that there's less of a difference in APIs over longer time frames. But it's not always possible.

It's not terrible in my experience of doing it several times now.

It is definitely less terrible than trying to unfuck tangles of terraform / terragrunt / yaml / bits of cloud infra.

I went through the migration from CentOS 6 to 7 and never want to do anything like that again. The good news, I guess, is that it never will happen again: CentOS is basically dead anyway, and it's not likely that so many core pieces of system software will change that drastically anymore.
I did CentOS 3 -> 4 -> 5 -> 6 -> 7 -> Debian. Very few problems.

(30 nodes)

I can't imagine you leaned into any one of those releases, then. That sequence involves major changes to the kernel, the init system, the configuration management tools, the core libraries, Apache, Python, Perl, etc. Any one of those alone could (and did, in my experience) trigger a major rewrite of configuration and/or code.

I'm glad it was painless for you. In my experience, it was not, and most of the reasons were beyond my control.

What does lean into mean here? A lot of software from 20 years ago compiles (if needed) and runs fine on the latest versions.
apache -> nginx. Python versions. postgres. All fine.
Did you crossgrade to Debian in-place?
What is it that people do that breaks so often due to lack of backwards compatibility from the OS?

IMO, the lure of an LTS is that you don't need to keep testing if your computer is still working every week when a set of updates come. Not that things that your software depends on the details remain frozen. If your software depends on the details of something, you should add it as a dependency.

The bigger problem IMO is not that things break, it's that if you depend on one LTS release too heavily, and you wait too long to migrate from one LTS to another, everything breaks all at once.

What should be a gradual migration as new things develop turns into a singular nightmare.

What are you depending on the OS that isn't extremely backwards compatible?

Once in a decade you get something like a breaking upgrade of nginx, or the glibc debacle of 2003. That may take a person-week to fix[1], what can hardly be called "herculean".

1 - If you go with 1 person * 1 week, if you try to go with 7 people * 1 day, it will suddenly cost 7 person-weeks. But the only way upgrading is such a hurry is if you borked a lot of things prior to it.

Off the top of my head, some of the things that have broken at an LTS transition that I've been involved with are out-of-tree kernel module builds, C code using OpenSSL, Puppet config, Salt config, RPM specfiles, Python code, Perl code, Apache configs, shell scripts, Java code, bootloader configs, bootstrap scripts, and init scripts/configs (esp. sysvinit to systemd). Any one of these things is not a problem in isolation, the problem is due to having to fix all of them all at once. Too much complexity put into any one of them (often arising from external requirements or rushed implementations) also makes migrating harder. Waiting until the 11th hour on the EOL clock just adds to the stress of the process.

Many of my bad experiences were because of corporate policies and lack of proper prioritization at levels above system administration. However, the sysadmin does have some choice in the matter, especially when greenfielding. You can turn stability into a vice if you're not careful.