Hacker News new | ask | show | jobs
by jwatte 3567 days ago
The detail was right there: debugging something in security caused massive logging which caused CPU bottlenecking.

Performance is the hardest thing to integration test for. Keeping careful track of CPU/memory/network/disk load with automated alerts can help.

(Fancy systems like running a traffic replica can help, too, but at a much higher cost.)

1 comments

We actually have a traffic replica (dark client) setup for the new webserver architecture we are gradually migrating to. It likely would have caught this before deploying to users.