|
|
|
|
|
by kjeetgill
2815 days ago
|
|
Huh, this is really interesting. Can you go more into this? How do you know the cause of the error? What kinda numbers are we talking here? Also, are cosmic rays really the main source of single bit flips as apposed to just bad ram maybe? |
|
Some of the time the instructions don't match up, indicating corruption _somewhere_.
For the specific case of crashes in JIT-generated code, the contents of registers and the instructions can be related in various ways (e.g. if you have a jmp instruction the register better contain your code location). And if you know where your code locations might be (because you're a JIT, and are generating the code and aligning it in memory yourself) and the register with the code location looks like the sort of address you would end up with but with one extra low bit set, say...
I am having trouble right now finding the bug report where some of the JIT engineers were analyzing crashes in jitcode, but about 1/3 of those were due to bitflips if I recall correctly. What that means in terms of absolute numbers (or numbers per user-hour, which would be even more useful), I don't know.
Note, by the way, single-bit flips can be a consequence of a bad memory chip, not just of cosmic rays.