| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by vidarh 1188 days ago

So you did automatisation in a broken way. Here's one way to avoid the issues you described on bare metal:

- Only get servers with IPMI so you can remote reboot / power cycle them.

- Have said servers netboot so they always run the newest OS image.

- Make sure said OS image has a config that isn't broken so you don't get full inodes and so it cycles logs.

- Have the OS image include journalbeat to ship logs.

- Have your health checks trigger a recovery script that restarts or moves containers using one of a myriad of tools; monitoring isn't exactly a new discipline.

Yes, it means you have to have a build process for OS images. Yes, it means you need to pick a monitoring system. And yes, it means you need to decide a scheduling policy.

I wrote an orchestrator pre-K8S that was fewer LOC than the yaml config for my home test K8S cluster. Writing a custom orchestrator is often not hard, depending on your workload, - writing a generic one is.

K8S provides one opinionated version of what people build manually, and when it's a good fit, it's great. When it isn't, I all to often see people spend more time trying to figure out how to make it work for them than it would've taken them to do it from scratch.