Bit flips are totally real, at scale you will definitely see them on large queries. There was a fun talk at DEFCON on bitsquatting, the process of buying 1 bit off domain names and then accepting all incoming connections. Attacks like rowhammer similarly abuse erroneous bit flips. Supposedly microsoft can detect solar activity based on the number of windows crash logs they receive.
I remember reading somewhere about that talk being debunked. Maybe someone more resourceful than me can find it.
It was something about being more likely to be a human typo or a config change that rolled out to a bunch of machines. The statistics didn't add up, and it wasn't plausible that bit flips caused it.
There are many places that memory exists. Your processor cache is memory. Registers are memor-ish. Hard Drives have memory. If a bit flip happens in your hard drives memory before being written to disk, then it's not unreasonable to think that the bitflip would persist, even through reboots.
I have run queries at large companies and found mistakes most easily explained as bitflips in domain names written to disk. Imagine an environmental variable configuring the use of a proxy without proper whitelists and it's not unimaginable to me that a production machine would be able to speak to machines on the internet at large.
I am open to the idea that what I think is happening might not be the mechanics of what is happening, but I find the talk believable, not based on theory, but actually seeing persisted (and non-persisted) bit flips in domain names queried from data warehoused logs at world scale companies.
That was the gist of the response, as far as I remember. It was repeated access from the same set of machines, so it was more likely that one bit flip persisted in a config and was subsequently rolled out to others. So the study grossly overestimated the number of actual bit flips occurring by like 20000x or something.
Unsure about the talk, but I read a research report from someone at I think it was Cisco who purchased a second level tld of a bitsquatted US state( think statenXX.us instead of state.XX.us ). The amount of email they received was staggering, and I'd have trouble believing that many people would make that mistake in typing.
"Supposedly" is false. The sun doesn't produce cosmic ray-power particles and soft errors aren't affected by sunspot activity. It is affected by altitude and space weather, but not solar activity.
I was in the audience of the talk. All devices should be required to use ECC because it's a security risk. Not as much as http:// era, but silent corruption across networks and systems is a thing.
From the link in your comment: "The average rate of cosmic-ray soft errors is inversely proportional to sunspot activity. That is, the average number of cosmic-ray soft errors decreases during the active portion of the sunspot cycle and increases during the quiet portion."
https://blog.mozilla.org/data/2022/04/13/this-week-in-glean-...