Hacker News new | ask | show | jobs
by bheadmaster 1021 days ago
Runtime asserts and invariant checks in software can also help a lot with isolating bitflip errors. With a nice addition of also isolating effects of software bugs.
5 comments

I don't know if it is significant. Runtime checks tend to focus on small but critical part of the data, like size fields. It usually doesn't check bulk data, like decompressed image data, or code, and it also may not be effective if data is in cache. Furthermore, it will only detect errors, not correct them. Also the performance cost is, I think, much higher than the extra RAM chip. Good coding practice for critical path in software, but clearly, it doesn't substitute for dedicated hardware.

I have had defective RAM, and I got quite a bit of corruption before the first crashes, it is hardly noticeable when it is just a pixel changing color in a picture, but it is still something you don't want. ECC would have prevented that.

I know there is software resistant on random bitflips, like for satellites exposed to cosmic rays, but it is a highly specialized field. It is also a field where they use special chips, typically with coarser (and therefore less efficient) dies that are more resistant to radiation. You leave a lot on the table for that.

ECC is better handled in hardware: most of the time it won’t happen, and the hardware can more easily interrupt the processor so the kernel can correct the problem or signal a fault if it’s not a correctable corruption.
Those only help isolate somewhat predictable errors. Which is rare for what ECC is designed to protect against.

If it’s a random, once in several billion reads/writes issue, it can just stop/identify the bad data from further propagating. Sometimes. That data is still lost.

ECC does forward error correction, which is extremely rare for the type of data protection you’re talking about. and if the data is corrupted in RAM (say when initially loaded/read) before the software can apply FEC, there is nothing the software can do.

I thought that the current wave of compiler correctness checking, zero-cost abstractions, JIT compilers and speculative processor behaviour were all about removing those "unnecessary" runtime asserts and invariant checks to get better performance.
Assuming the compiler doesn't optimize them out.