Hacker News new | ask | show | jobs
by bzbarsky 2815 days ago
You computer? Not sure. Computers in general? A noticeable fraction of Firefox crash reports are due to single-bit memory flips.
1 comments

Huh, this is really interesting. Can you go more into this? How do you know the cause of the error? What kinda numbers are we talking here?

Also, are cosmic rays really the main source of single bit flips as apposed to just bad ram maybe?

Firefox crash reports include the contents of registers and a few instructions around the instruction pointer. If the crash is in compiled code, not JIT code, you also know where in the binary you were and can get symbols from a symbol server, then compare what the instructions should be to what they actually were.

Some of the time the instructions don't match up, indicating corruption _somewhere_.

For the specific case of crashes in JIT-generated code, the contents of registers and the instructions can be related in various ways (e.g. if you have a jmp instruction the register better contain your code location). And if you know where your code locations might be (because you're a JIT, and are generating the code and aligning it in memory yourself) and the register with the code location looks like the sort of address you would end up with but with one extra low bit set, say...

I am having trouble right now finding the bug report where some of the JIT engineers were analyzing crashes in jitcode, but about 1/3 of those were due to bitflips if I recall correctly. What that means in terms of absolute numbers (or numbers per user-hour, which would be even more useful), I don't know.

Note, by the way, single-bit flips can be a consequence of a bad memory chip, not just of cosmic rays.

Thanks! Incredibly interesting.