Hacker News new | ask | show | jobs
by breckognize 809 days ago
When I worked on S3, I was briefly responsible for reporting waste in the system. The basic equation was [Total Capacity of Hard Drives] - [# of bytes customers are paying for] * [Replication factor] = Waste

One week as I was preparing the report, it was clear something had gone haywire. Waste was roughly equal to total capacity. So either we'd lost all of our customers overnight, or there was a bug. Turns out the legacy billing system was using a long to count # of paid bytes and this had overflowed. So it does happen.

2 comments

This is a classic example of why overflow instead of being defined (as some people want) as wrapping (since that's what the CPU naturally does for basic arithmetic), it should be an error, and if you didn't handle it (and in many cases you hadn't even realised it could happen so why would you?) thus fatal.

If the report runs and says something like "Fatal: Overflow while multiplying customer_space * repl_factor. Consider floating point numbers or a larger integer type" - you'd go "Oops" fix the bug and run it again.

u64 overflows at 16 exabytes, so it sounds plausible. Cool story.