Hacker News new | ask | show | jobs
by zzzcpan 3567 days ago
Strangely, there are no actual technical details in the report and the blame is on the process. Although most of the times there is some way to prevent bugs from causing problems with better architecture.
1 comments

The detail was right there: debugging something in security caused massive logging which caused CPU bottlenecking.

Performance is the hardest thing to integration test for. Keeping careful track of CPU/memory/network/disk load with automated alerts can help.

(Fancy systems like running a traffic replica can help, too, but at a much higher cost.)

We actually have a traffic replica (dark client) setup for the new webserver architecture we are gradually migrating to. It likely would have caught this before deploying to users.