Hacker News new | ask | show | jobs
by twic 884 days ago
Exactly. If a service crashes within a second ten times in a row, it's not going to come up cleanly an eleventh time. The right thing to do is stay down, and let monitoring get the attention of a human operator who can figure out what the problem is. Continually rebooting is just going to fill up logs, spam other services, and generally make trouble.

I'm sure there are exceptions to this. For those, set Restart=always. But it's an absolutely terrible default.

2 comments

It might actually, if a network connection is temporarily down.
Or a disk not attached yet. Or another service it depends on being slow to finish starting up.
So, you two know how systemd gets heat for doing too much, right?

This is one of those things.

The 'After=' and 'Requires=' directives address this.

Depends on a mount? Point those directives at a '.mount' unit.

Depends on networking, perhaps a specific NIC? Point those directives at 'systemd-networkd-wait-online@$REQUIRED_NIC.service'

Point being: declare these things, don't wait for entropy to eventually become stable.

After and Requires are only when starting the service though. If a service (stupidly) crashes when a network connection is temporarily down (someone tripped over the router's power cord?), it needs to restart until the network connection is back up.
Sure, but now we're kind of back where we started: 'Restart='

With the requirements properly laid out we've avoided restarting in a loop and a bit of robustness

There's also 'PartOf=' which can help make the relationship bidirectional

I get your point, but these features are the bare minimum any boot system should have. If someone calls that “bloat”, they should go back and hit rocks together.
Agreed. Relationships in 'init' are principle

Back on point though: don't expect the 11th restart to work when the last 10 didn't.

Contrived examples are contrived, it's solved. Declaring dependencies.

Interestingly, the kubernetes approach is the opposite one. Dependencies between pods / software components are encouraged to be a little softer, so that the scheduler is simpler.

Starting up, noticing that the environment doesn't have what you need yet and dying quickly appears to be The Kubernetes Way. A scheduler will eventually restart you and you'll have another go. Repeat until everything is up.

The kubelet operates the same way afair. On a node that hasn't joined a cluster yet, it sits in a fail/restart loop until it's provisioned.