Hacker News new | ask | show | jobs
by krobertson 5578 days ago
Their problem isn't their deployment process, its their monitoring.

Blindly ignoring errors is a recipe for failure. You should always look at situation like that asking "how can we monitor this weak point?" Logging plus a service like Splunk work great.

Should always have a solid on call rotation. We have two rotations, and ops one which is first line, and dev in case deeper code changes or more eyes on it are needed.