Hacker News new | ask | show | jobs
by raffraffraff 1552 days ago
Yuck. Honestly, restarting a database to fix a major outage sounds like "we have no idea what we're doing"
4 comments

It sounds like "they don't know why it's going down." I've worked with plenty super competent people that have taken time to root cause incidents.

Guide to incidents: Step 1: Stop the bleeding Step 2: Prevent it in the future

Doing Step 1 doesn't make you incompetent.

I'm not a DBA, and maybe you're not a DBA either, so this question goes to DBAs who may be reading: aren't you always better off killing the bad queries instead of rebooting the whole box, if that's an option? (ie: aside from times when the entire host is screwed, load per core is >50, metrics aren't getting out, you can't ssh in etc)
Sporadic database performance issues can certainly make you feel that way. They are definitely not trivially debugged at scale
Would you rather it stay down while they spend a day debugging it?
If that means it won't be down every morning in my time zone then yes.
As long as it's announced in advance so that users/customers can plan ahead, I don't see why not.
They could use multiple writer hosts and rollover the restarts. MySQL has had GTIDs since 5.6 and replication groups rather than writer-replicas since some 5.7.x version.