Hacker News new | ask | show | jobs
by dom0 3183 days ago
Well ECC memory uses eight extra bits on the data bus, backed by a extra chip(s) (depending on the module's organisation); ECC memory modules effectively store one extra eighth of redundant data.

However, in many applications we find that using (forward) error correction almost always increases data density (for storage) or bandwidth (for transmission), simply because a FEC stream does not require a nearly-perfect channel any more. This is the way hard disks, SSDs, WiFi, LTE, DSL, ..., satellite communications, ...[, ...][, ...] are able to cram incredible amounts of data into very noisy channels. Thus, ECC significantly lowers cost in many dimensions (be it frequency spectra, storage prices, not having to re-cable entire countries...).

(And if you don't use the extra noise margin to increase density/bandwidth, then you can use it to increase reliability, like we usually do with ECC memory)

Thinking about it for a few minutes, the memory bus will most likely be the only bus in your computer that has no error correction/detection. USB, SATA, PCIe, all of them require it. The main memory will also most likely be the only storage that doesn't use it (apart from firmware flash chips and the like, but these often use a checksum at least).

2 comments

> However, in many applications we find that using (forward) error correction almost always increases data density (for storage) or bandwidth (for transmission), simply because a FEC stream does not require a nearly-perfect channel any more.

Do you mean that ECC has those benefits, or that other applications of error correcting codes has them?

The reliability boost that ECC DRAM gives you could be reinterpreted as extra headroom for overclocking the DRAM before it becomes too unstable. Since the parity bits are carried on extra data lines, they aren't subtracting from your usable memory bandwidth so the net effect may be a substantial performance advantage when operating at equivalent reliability levels. The main concern is whether the memory controller can correct errors without a severe latency penalty. The ECC used for DRAM is far simpler than the LDPC used for things like SSDs, so it's probably not an issue. (However, systems halting on the detection of a double bit uncorrectable error would be an inconvenience.)
That it’s common in the other peripherals really says something.