| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by acdha 894 days ago

> Is just fine.

It’s prolonged downtime, that’s what it is, and it may make the system unusable. I saw that several times during the transition period where a network outage or data center shutdown meant that some important daemons failed on startup (one of them had the concept of retrying a connection error, but not a DNS resolution failure) and the admins of the SysV boxes had to manually restart everything while the Upstart & systemd boxes recovered almost immediately. One fun case had the developer almost getting this right: they had implemented retries but without a time delay or back off so their init script maxed out its retry count in a second of “host not found” errors and then exited permanently.

When you leave things like retries, logging, or dropping permissions up to each daemon you end up with a hodgepodge of incomplete implementations and things like error handling are where corners tend to get cut the most since the situations are infrequent and often hard to simulate.

1 comments

toast0 894 days ago

> It’s prolonged downtime, that’s what it is, and it may make the system unusable

IMHO, that's how I feel about systemd --- if a startup task is stalled, the system is non-interactive, unless something has changed since I last experienced this issue.

If some other operator access has started, you can use that, but the console is useless.

link