Hacker News new | ask | show | jobs
by jeffreyq 26 days ago
https://blog.railway.com/p/incident-report-february-11-2026
1 comments

"we did not have the monitoring or controls to prevent our anti-fraud from hard killing 3% of workloads, including many instances of pg"

Oof.

Needs an anti-anti-fraud service which terminates malfunctioning anti-fraud services.
When I've written similar services, there was a (low) hard cap on how many fraud decisions they could action before they quit and paged. If we were getting hit with a wave of something, a human had to temporarily bump that limit.