|
|
|
|
|
by vlovich123
1526 days ago
|
|
Here’s contradicting evidence to your position: https://static.googleusercontent.com/media/research.google.c... The point op makes is that the more complicate a claim is made, the more evidence is required. More common sources of errors would seem to be more likely and thus more common causes of bit flips. Thus more evidence is required for the cosmic ray hypothesis being a dominant reason than anything else. We know that empirically there’s ~1 bug in every 1k lines of code. 1 in 10k if you have very good tests. But flip type errors are probably less common so let’s guess and say 1 in 10 million. There’s about ~30 million lines of code in the Linux kernel. There’s probably a similar amount of userspace code (eg Firefox is also around 20 million lines). Then think about the Verilog that backs HW designs. I don’t know the size of those codebases to have estimates but it feels like bit flip bugs are possible there. Then you’ve got to actually synthesize that digital logic and implement it in analog space. Components could easily be driven out of spec electrically (whether by accident, manufacturing defect, or swapping in lower cost components) and bit flips would be comparatively a common type of error when shuttling them around, especially sensitive across high bandwidth links that aren’t error-checked. The point is, the combined probability of all these sources of errors seems higher probability than true cosmic rays being behind bit flips. The Google paper is just more evidence of this. I’m sure measuring just for cosmic rays you’ll be able to see their impact. In a running production at scale on variable quality hardware running on arbitrary software versions, all other sources of errors would seem like more likely first order effects that would swamp any ability to detect cosmic rays. Not to say that Mozilla hasn’t accounted for it. Just that OP’s position is the default sensible position to start from (ie Occam’s razor). |
|