Hacker News new | ask | show | jobs
by touisteur 1313 days ago
One of the preternal problems of such hardware watchdogs was the inability to discriminate whether a sudden reboot was due a reset-button, hw security (e.g. temperature), ECC problem, or (micro) loss of power, or HW watchdog.

On most IPMI-capable BIOS/firmware there's now (been for 10 years but I'm old) an option to log 'system' events (ipmi failures like fan speeds if you've set threshold, but also reboot reasons). It's call the System Event Log. Very useful.

And on IPMI-plugged watchdogs, you can also see the state of the HW watchdog (is it running, how many seconds are left). Very useful too.