Hacker News new | ask | show | jobs
by mananaysiempre 1064 days ago
> Whenever there's a new distribution release it invariably breaks a bunch of things with the automation and you spend more time massaging your playbook so it works again than it would have taken to do it by hand

The rolling-release life is that things break constantly, during each week’s upgrade, but only a little bit at a time (and hopefully in staging). I don’t know if this is better for system administration, necessarily, but if you’re used to a stable-release dynamic of heavy discrete breakage and piles of backported patches, then you might be imagining the same scale of breakage every upgrade, which is isn’t the experience at all. So don’t discard the rolling-release option because of this preconception.

1 comments

The difference is: If there is a weekly breakage on a weekly update, delaying with it is just part of the process and timed in.

If you only update every few years, each update becomes a full project distracting from and conflicting with other projects.

> The difference is: If there is a weekly breakage on a weekly update, delaying with it is just part of the process and timed in.

That entirely depends on your operations model. There's a difference between, say, a nuclear power plant and a colo web hosting shop. With the latter, sure, no problem risking "minor" weekly breakage. With the former, I'd much rather have scheduled, heavily tested and carefully monitored maintenance windows.

And HN tends to underestimate the number of places like the former exist. Backbones of global finance and telecom, industrial facilities of all kinds, etc.

A nuclear power plant control software hopefully isn't connected to external systems, but fully isolated.

And yes, upgrading that is a full blown project.

It is very different from a "living" software environment with ongoing development processes.

So how about a more down-to-earth example. Medical imaging.

Changes to graphics drivers, can, do, and have impacted how things like MRI results get rendered by software. It's going to have at least some networking with the rest of the hospital and difficult to completely airgap, but at the same time you cannot update it willy nilly with the latest and greatest uncertified drivers.

That's precisely where "enterprise" distros are sometimes necessary.

The hospital there also isn't the organisation doing the development and shouldn't really care about the OS the appliance uses.

The vendor (Siemens or whoever) should make updating the OS part of their development cycle and certify accordingly. If drivers are relevant part of the product there, they have to take enough ownership on the drivers to ensure they do the right thing.

This lack of ownership is exactly the problem with most IT departments today.
Even this is probably an over exaggerated example. Although to your point with thousands of hospitals, medical centers and imaging centers in the US there are enough to consider it a common occurrence at a community level.

Another example would be systems setup for CAD. Certain levels of CAD require certified video drivers in order to unsure tolerances are met. Error introduced at this level destroys product and possibly people.

At $dayjob, PACS devices use a separate VLAN and can only talk to their servers. Although to be fair, most of the problems with that system are down to the frontend using outdated js/asp/Java code that edge/chrome doesn't support anymore.
It is the countless places that exist at some level between your two extremes of the nuclear power plant and the colo web host that need Enterprise Linux.
My point when I mentioned rolling-release was that I’m very unsure whether they need Enterprise Linux and a bout of desperate firefighting every several years or a rolling-release distro and a small but respected team with a staging environment and a steady supply of handheld fire extinguishers.

I could be convinced the former were the answer in many cases, but I’ve never seen the argument for that move beyond a bare proclamation like you’ve just made. It may also be that the argument for this is so mired in the particulars of a situation that it essentially can’t be articulated, but for me that’s mutually exclusive with it applying to a broad, vaguely defined classes of deployments like you’ve just done. (A complex argument is bound to produce an intricately shaped class.)

My (admittedly theoretical) fear in my initial comment was that the LTS people read about kernel updates every month, think of the breakage their LTS encounters on every kernel update, and nope out. Yet without the humongous pile of backported patches the LTS requires kernel updates are the most benign thing in the world if you can afford the reboots—a decade of them with literally not a single issue can and does happen.

"I’ve never seen the argument for that move beyond a bare proclamation like you’ve just made."

Do you work in an industry that requires a specific service level to be maintained? How many hours per day? What does downtime cost? What does redundancy cost?

We have situations that cost millions of dollars per hour when downtime occurs. Now how do you protect those systems?

A definition of Hubris is not believing something exists because you haven't seen it before.

Our tasks as Systems Administrators is not to make a career out of updating systems for maintenance patches. Our task is to design systems to minimize updates and then redesign systems to eliminate wasted user interaction and downtime.