What's "the db"? It sounds like something of small to medium scale if you can just restart it like that.
In any case, why not just relocate some vendor engineers on site for a bit? Or, better, why does the vendor not have a small presence in the corner?
Sounds like whatever "the db" is it's probably some (objectively) small but very scary thing that's currently on fire and people are trying to figure out how to put it out without crashing the plane and also making too many waves internally, which is probably even harder. So asking about making vendor noises is (as useful as it may be) probably going down the wrong path - in much the same way this is probably not related to the outages (it may well be, but from the outside it's all coincidence anyway).
IIS Server had/has a memory leak in worker threads that many years ago always forced us to restart the server every few days. Starting in 6.0, they added worker thread recycling and made it a mandatory to choose a time period for every thread to be recycled. Why fix the error when you can just restart the service?
For old-school mod_perl apps setting MaxRequestsPerChild was often a much better ROI than actually finding and fixing the leaks.
Speaking as somebody who's done over a decade of large scale OO applications perl and is actually really good at finding and fixing the leaks, this has often been intellectually aggravating but every time I've set that option instead I rewarded myself with a glass of bourbon for picking the pragmatic choice and then went back to adding (non-leaky) features that were far more useful to the company in question than cleaning up the older code would've been.
For GitHub? It seems unbelievable that they would use IIS pre-purchase and why in the world would you mix in a second web server for post-purchase enhancements.
Why trade an open source solution with third rate garbage that is called IIS which runs on a sub-par desktop OS called Windows. I thought that Github was supposed to be independant.
If GH is around the same level of integration with Microsoft as my employer, which is another Microsoft acquisition, I don't really believe you have a ton of insight into GH processes.
I dated a girl at GitHub for awhile last year who said they weren’t even completely off of AWS yet and she liked how they didn’t seem like working for Microsoft. Maybe this has changed though.
I'm not a DBA, and maybe you're not a DBA either, so this question goes to DBAs who may be reading: aren't you always better off killing the bad queries instead of rebooting the whole box, if that's an option? (ie: aside from times when the entire host is screwed, load per core is >50, metrics aren't getting out, you can't ssh in etc)
They could use multiple writer hosts and rollover the restarts. MySQL has had GTIDs since 5.6 and replication groups rather than writer-replicas since some 5.7.x version.
In any case, why not just relocate some vendor engineers on site for a bit? Or, better, why does the vendor not have a small presence in the corner?
Sounds like whatever "the db" is it's probably some (objectively) small but very scary thing that's currently on fire and people are trying to figure out how to put it out without crashing the plane and also making too many waves internally, which is probably even harder. So asking about making vendor noises is (as useful as it may be) probably going down the wrong path - in much the same way this is probably not related to the outages (it may well be, but from the outside it's all coincidence anyway).