| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by twic 884 days ago
	Exactly. If a service crashes within a second ten times in a row, it's not going to come up cleanly an eleventh time. The right thing to do is stay down, and let monitoring get the attention of a human operator who can figure out what the problem is. Continually rebooting is just going to fill up logs, spam other services, and generally make trouble. I'm sure there are exceptions to this. For those, set Restart=always. But it's an absolutely terrible default.

2 comments

BenjiWiebe 884 days ago

It might actually, if a network connection is temporarily down.

link

rendaw 884 days ago

Or a disk not attached yet. Or another service it depends on being slow to finish starting up.

link

bravetraveler 884 days ago

So, you two know how systemd gets heat for doing too much, right?

This is one of those things.

The 'After=' and 'Requires=' directives address this.

Depends on a mount? Point those directives at a '.mount' unit.

Depends on networking, perhaps a specific NIC? Point those directives at 'systemd-networkd-wait-online@$REQUIRED_NIC.service'

Point being: declare these things, don't wait for entropy to eventually become stable.

link

BenjiWiebe 875 days ago

After and Requires are only when starting the service though. If a service (stupidly) crashes when a network connection is temporarily down (someone tripped over the router's power cord?), it needs to restart until the network connection is back up.

link

bravetraveler 873 days ago

Sure, but now we're kind of back where we started: 'Restart='

With the requirements properly laid out we've avoided restarting in a loop and a bit of robustness

There's also 'PartOf=' which can help make the relationship bidirectional

link

kaba0 884 days ago

I get your point, but these features are the bare minimum any boot system should have. If someone calls that “bloat”, they should go back and hit rocks together.

link

bravetraveler 884 days ago

Agreed. Relationships in 'init' are principle

Back on point though: don't expect the 11th restart to work when the last 10 didn't.

Contrived examples are contrived, it's solved. Declaring dependencies.

link

growse 882 days ago

Interestingly, the kubernetes approach is the opposite one. Dependencies between pods / software components are encouraged to be a little softer, so that the scheduler is simpler.

Starting up, noticing that the environment doesn't have what you need yet and dying quickly appears to be The Kubernetes Way. A scheduler will eventually restart you and you'll have another go. Repeat until everything is up.

The kubelet operates the same way afair. On a node that hasn't joined a cluster yet, it sits in a fail/restart loop until it's provisioned.

link