Hacker News new | ask | show | jobs
by bogantech 1064 days ago
> I'd like to see the RHEL stability model go away too and force people to complete their automation and solve the problems of being able to rebuilding on demand - and actually doing it.

Whenever there's a new distribution release it invariably breaks a bunch of things with the automation and you spend more time massaging your playbook so it works again than it would have taken to do it by hand

2 comments

systemd threw the biggest wrench, by far, in my automation workflows (this was before containers came to dominate a lot of the landscape, so everything was managed with init scripts). I like it now, but it also broke things quite frequently in the early days, and there was a looong period of time when you had to shim software to work with systemd.

But even still, things like snaps, the way Debian handles system Python, and various little changes that have an outsize effect on automated deployment do cause a good amount of churn with automation.

Yup, enterprise Linux insulates you from unneeded change (in the business context). For most companies, systemd will have no impact on the bottom line vs sysvinit vs whatever.

However, paying an extra engineer to sort through all the changes possibly will.

On the other hand, there's some interesting trends like monokernels and minimal OS images that leverage services running off-machine instead of expecting so many local services removing some of the complexity/volatility (DNS, SMTP, federated login)

>things like snaps, the way Debian handles system Python

Both these things should not be an issue for anyone, just one or the other.

> Whenever there's a new distribution release it invariably breaks a bunch of things with the automation and you spend more time massaging your playbook so it works again than it would have taken to do it by hand

The rolling-release life is that things break constantly, during each week’s upgrade, but only a little bit at a time (and hopefully in staging). I don’t know if this is better for system administration, necessarily, but if you’re used to a stable-release dynamic of heavy discrete breakage and piles of backported patches, then you might be imagining the same scale of breakage every upgrade, which is isn’t the experience at all. So don’t discard the rolling-release option because of this preconception.

The difference is: If there is a weekly breakage on a weekly update, delaying with it is just part of the process and timed in.

If you only update every few years, each update becomes a full project distracting from and conflicting with other projects.

> The difference is: If there is a weekly breakage on a weekly update, delaying with it is just part of the process and timed in.

That entirely depends on your operations model. There's a difference between, say, a nuclear power plant and a colo web hosting shop. With the latter, sure, no problem risking "minor" weekly breakage. With the former, I'd much rather have scheduled, heavily tested and carefully monitored maintenance windows.

And HN tends to underestimate the number of places like the former exist. Backbones of global finance and telecom, industrial facilities of all kinds, etc.

A nuclear power plant control software hopefully isn't connected to external systems, but fully isolated.

And yes, upgrading that is a full blown project.

It is very different from a "living" software environment with ongoing development processes.

So how about a more down-to-earth example. Medical imaging.

Changes to graphics drivers, can, do, and have impacted how things like MRI results get rendered by software. It's going to have at least some networking with the rest of the hospital and difficult to completely airgap, but at the same time you cannot update it willy nilly with the latest and greatest uncertified drivers.

That's precisely where "enterprise" distros are sometimes necessary.

The hospital there also isn't the organisation doing the development and shouldn't really care about the OS the appliance uses.

The vendor (Siemens or whoever) should make updating the OS part of their development cycle and certify accordingly. If drivers are relevant part of the product there, they have to take enough ownership on the drivers to ensure they do the right thing.

Even this is probably an over exaggerated example. Although to your point with thousands of hospitals, medical centers and imaging centers in the US there are enough to consider it a common occurrence at a community level.

Another example would be systems setup for CAD. Certain levels of CAD require certified video drivers in order to unsure tolerances are met. Error introduced at this level destroys product and possibly people.

At $dayjob, PACS devices use a separate VLAN and can only talk to their servers. Although to be fair, most of the problems with that system are down to the frontend using outdated js/asp/Java code that edge/chrome doesn't support anymore.
It is the countless places that exist at some level between your two extremes of the nuclear power plant and the colo web host that need Enterprise Linux.
My point when I mentioned rolling-release was that I’m very unsure whether they need Enterprise Linux and a bout of desperate firefighting every several years or a rolling-release distro and a small but respected team with a staging environment and a steady supply of handheld fire extinguishers.

I could be convinced the former were the answer in many cases, but I’ve never seen the argument for that move beyond a bare proclamation like you’ve just made. It may also be that the argument for this is so mired in the particulars of a situation that it essentially can’t be articulated, but for me that’s mutually exclusive with it applying to a broad, vaguely defined classes of deployments like you’ve just done. (A complex argument is bound to produce an intricately shaped class.)

My (admittedly theoretical) fear in my initial comment was that the LTS people read about kernel updates every month, think of the breakage their LTS encounters on every kernel update, and nope out. Yet without the humongous pile of backported patches the LTS requires kernel updates are the most benign thing in the world if you can afford the reboots—a decade of them with literally not a single issue can and does happen.

"I’ve never seen the argument for that move beyond a bare proclamation like you’ve just made."

Do you work in an industry that requires a specific service level to be maintained? How many hours per day? What does downtime cost? What does redundancy cost?

We have situations that cost millions of dollars per hour when downtime occurs. Now how do you protect those systems?

A definition of Hubris is not believing something exists because you haven't seen it before.

Our tasks as Systems Administrators is not to make a career out of updating systems for maintenance patches. Our task is to design systems to minimize updates and then redesign systems to eliminate wasted user interaction and downtime.