Hacker News new | ask | show | jobs
by bananas 4418 days ago
Until you reboot and it doesn't :)
1 comments

And if you do reboot frequently but then at one point it doesn't work ?

You either have a long downtime or have a secondary ready to take over. I understand you're talking about avoiding to rely on a false sense of confidence; that's a good rule of thumb, but shouldn't get in the way of better solutions:

Test the secondary more often, and automate the recovery, and you'll cover more failure cases.

You don't have the resources for a secondary ? This means you're to small; neither this tool is for you; keep it simple, and just reboot.

Moreover, in some context is not desirable to perform a maintenance window until some designated moments like weekends. This means that you could leave your system vulnerable for up to a whole week.

I agree with what you're saying about having better recovery, but I think you need some balance between the two.

I'd rather look through a week's worth of changes to work out what borked the restart than two years worth!

>And if you do reboot frequently but then at one point it doesn't work ?

Then you restore from yesterday's backup. It's probably still on-site. If a kernel live patch fails, you might not notice for a long time.

My point is you should always be prepared that a reboot doesn't work, since it could happen at any time between your maintenance windows.

Frequently exercising a feature is a good idea, especially when you depend on that feature. That's why exercising backup restores is important; you want that to work when you need it. Do you really need a production machine to boot when you need it? I think is a wrong target to optimize for, given that that failure more is already covered by backups cold/hot secondary as long as you do exercise them frequently.

That said, you do have a point about whether a live patch is reliable, but this doesn't have anything to do with whether increasing the average uptime of a host is a good or bad idea w.r.t to ensuring that the machine can start or not.

TBH I wouldn't personally feel much comfortable using this live update thing, but I have no experience with it. I'd be curious to know more though.