Hacker News new | ask | show | jobs
by mekaj 2728 days ago
There was an incident with similar consequences on April 10, 2014. The cause was a programmed threshold being breached and the impact was 6h of downtime.

Source: "The Coming Software Apocalypse" published by The Atlantic (https://www.theatlantic.com/technology/archive/2017/09/savin...)

walrus01 also linked to https://www.fcc.gov/document/april-2014-multistate-911-outag... in another comment.

1 comments

> Operated by a systems provider named Intrado, the server kept a running counter of how many calls it had routed to 911 dispatchers around the country. Intrado programmers had set a threshold for how high the counter could go. They picked a number in the millions.

I'm really curious if there's some explanation that makes this sound less catastrophically stupid, particularly the part where they picked a threshold less than INT_MAX.

What do you mean, VARCHAR(8) is a perfectly valid way of storing counters...
much better to future proof it by using a BLOB. You know, just in case you ever need to shove something else in there.
if you're going to use a BLOB how can you know the type and how to decode it? Make sure to serialize that data with something like protobuf.
Nah, describe it using some custom binary format and have a 1000 page spec to document it!
Yay, seralised bluetooth! ;)
Allocating pieces of a resource without accounting for reusing released resources?
Great question. Maybe they did pick a sensible constant, but it still wasn't enough. I.e. INT_MAX wasn't big enough.