Hacker News new | ask | show | jobs
by glowingly 1335 days ago
What I am curious about, is some Nvidia cards like the A2000 have ECC, but only enough chips for a regular roundish number of RAM, like 6GB or 12GB. So when ECC is enabled, 6.25% of the RAM is used for the ECC bits. [0; 1, in the notes]

Since desktop ECC gets around this by having physically more RAM ICs (usually 9 instead of 8, for example), what is the impediment from having a similar solution to Nvidia? I'd readily take a hit to memory capacity* and performance in exchange for ECC.

Why can't the memory controller already do this?

I should note, I'm mostly thinking of my NAS. I know ZFS can be run without ECC and some consumer solutions do. However, it seems ZFS should be run with ECC. I've already experienced observable bitrot with older images and video files, I'd rather not let it progress.

[*] in this case, 12.5% if we follow typical desktop ECC allocations

[0] https://www.nvidia.com/content/Control-Panel-Help/vLatest/en...

[1] https://docs.nvidia.com/cuda/cuda-c-best-practices-guide/ind...

8 comments

The primary problem is that on CPU memory systems, all the requests are always 64 bytes, and the entire system starting from the CPU caches and ending at the arrays in the DIMMs is designed to efficiently serve those requests at the lowest possible latency.

In-band ECC means significant sacrifice of performance on a system not designed for it. Random read throughput doesn't go down by 6.25%, it goes down by half.

> In-band ECC means significant sacrifice of performance on a system not designed for it. Random read throughput doesn't go down by 6.25%, it goes down by half.

But adjusting DDR for that could be pretty easy. Instead of a burst of 16 transfers, do 18. It's already set up to stream longer transfers when desired.

There will be more overhead than making the sizes properly match, but it shouldn't be anywhere near cutting throughput in half.

> But adjusting DDR for that could be pretty easy. Instead of a burst of 16 transfers, do 18. It's already set up to stream longer transfers when desired.

That's not really how DDR5 works. The granularity of column addresses is (iirc) 32 bytes, and you cannot do transfers that are of any length other than 64 or 32 bytes (and 32 bytes only with burst chop, which means that the bank is busy for the remaining 8 cycles). Bursts longer than 16 are really just multiple adjancent requests, with an optimized command.

You could change this, by completely changing how the memory modules themselves work, and by widening the column address for more granularity. Can't do it well by just tweaking the memory controllers.

I feel like changing the width of some of the IO components on the modules is closer to "tweaking" than to "completely changing".

I wasn't trying to suggest you could do it by changing only the memory controllers and not the DIMMs.

Ah. If you make any silicon changes at all, it is orders of magnitude more expensive and "harder" than just using extra chips like normal ECC DRAM does.
I'm suggesting a change JEDEC could have made when incrementing the DDR number, basically.

Especially since the ECC overhead on DDR5 is so high.

Intel's chips already support operating in such a mode. See embedded Elkhart Lake and Tiger Lake parts.

(And nvidia Tegra does in-band ECC too)

By the way, RTX 4090 doesn't have ECC disabled: https://techgage.com/article/nvidia-geforce-rtx-4090-the-new...

> I know ZFS can be run without ECC and some consumer solutions do. However, it seems ZFS should be run with ECC. I've already experienced observable bitrot with older images and video files, I'd rather not let it progress.

From my understanding, the only risk to your data from non-ECC is a bit flip in RAM, pre-checksum calculation. In that unlikely scenario, you commit bad data to disk as good data(valid checksum). Bitrot isn't a factor, at all.

This means that ECC RAM and ZFS are completely orthogonal concerns.

If your data is important enough to warrant ECC RAM, you should get ECC RAM whether you use ZFS or not.

If you want to use ZFS (for its volume management, compression, mirroring, healthchecks, whathaveyou), you should do so whether or not you have ECC RAM.

Most people who care enough about data integrity to use ZFS should also be using ECC RAM for the same reason. Which is most, but not all users.

If you’re using ZFS for other reasons, then you be you I guess.

That is bitrot: you save correct data and it’s not retrievable. The fact that it happens in RAM rather than on the storage media, controller, or I/O channel just makes it a different category.
It is also far, far more likely that an uncorrected bit flip happens outside the relatively small portion of time the kernel spends in filesystem code. This is not a ZFS-specific problem by any means.
> From my understanding, the only risk to your data from non-ECC is a bit flip in RAM, pre-checksum calculation. In that unlikely scenario, you commit bad data to disk as good data(valid checksum).

Wouldn't an option to do it twice in different memory regions be nice? I'm pretty sure in many use cases scarifying performance for greater reliability wouldn't be an issue. Given how many cores we have available nowadays it could potentially even not have that much impact on performance.

Also are there any software solutions (like a kernel patch) which would do "software ECC"? I imagine in this case performance hit would be quite devastating but it still could be acceptable trade-off for NAS-like systems where you want to have lots of RAM for dedup and cache but it's not a busy system.

There is still a race condition: if you read data from disk into a buffer, make a copy of the buffer, then do 2 checksums, the bit flip can still occur before the 2nd copy is created.
Are you worried that the data is corrupted on disk but a random bit flip makes it look right?

Otherwise a bit flip that early during read shouldn't matter because you're checking it against the disk checksum.

If you don't have disk checksums then ECC memory is not where you should be putting effort to keep things safe.

On ZFS, I don't think there's any reason why it needs ECC more than any other filesystem [0].

[0] - https://jrs-s.net/2015/02/03/will-zfs-and-non-ecc-ram-kill-y...

It looks like ZFS should be run with ECC

"...We further demonstrate that ZFS is less resilient to memory corruption, which can lead to corrupt data being returned to applications or system crashes..." - https://research.cs.wisc.edu/wind/Publications/zfs-corruptio...

"Please Use ZFS With ECC Memory" - https://louwrentius.com/please-use-zfs-with-ecc-memory.html

A fuller instance of the first quote: "Through careful and thorough fault injection, we show that ZFS is robust to a wide range of disk faults. We further demonstrate that ZFS is less resilient to memory corruption, which can lead to corrupt data being returned to applications or system crashes"

...ie, ZFS is 'less resilient' in comparison to its robust disk fault handling, not that it's less resilient to memory corruption in comparison to other filesystems. The parent quotation above implies that ZFS is more sensitive to memory corruption than other fs but that is not claimed in the referenced paper.

Presumably zfs is the most-mainstream fs which will actually notice and complain about in-memory bit flips.
I have always wondered exactly this.

Market segmentation on what platforms/controllers support ECC: fine, whatever. But market segmentation of what is an “ECC DIMM” vs. a “regular DIMM”? It makes no sense that the commodity memory manufacturers have any leverage to enforce that segmentation.

Is it just laziness on the part of the platform vendors (who do have leverage) not simply allow ECC with any DIMMs by giving over 1/k {bits, lines, pages, chips, whatever-granularity-they-reason-about} to parity?

It’s not the memory providers but the chipset manufacturers like Intel pushing customers to the expensive workstation/server lineups.
On the Intel side, yes.

On the AMD side, no.

However Intel guarantees ECC will work on their "workstation" chipsets. AMD doesn't guarantee ECC will work on their desktop/workstation chipsets. You have to go up to Epyc to find a guaranteed/tested ECC.

I’m constantly surprised that it’s not commonplace to use on-disk parity files.

It’s so uncommon that the PAR3 format was never really finished and no one has created a replacement that handles subfolders.

Why I’m surprised: Not only does it solve the problem of bit-rot, but the parity files can be moved to USB sticks, NAS drives, Mobile devices, etc and the original files can be verified/repaired by any device that understand the parity file format. PAR2 is still great for photos/audio/video, as well as any flat-folder assets.

PAR is somewhat unwieldy to use. In addition to needing to explicitly create it (and it not being particularly fast, on a large enough data set), PAR2s can't be 'updated'. The PAR3 spec allows for some limited updating, but it's far from ideal.

It often makes more sense for the file system to deal with ECC in my opinion. PAR probably makes more sense for archived files that aren't expected to change, but may be moved across file systems.

PAR2 handles subfolders by the way, just not empty folders.

No exactly: The current PAR format does not make sense for this use-case (including because of the limitations you mentioned), but IMO the technology does.

Files with on-disk ECC can be moved from cloud to cloud, cloud to desktop, filesystem to filesystem, desktop to stick, then stick to NAS all without losing ECC protection. No single filesystem can do that.

Sorry if I'm dense, but what does "this use-case" exactly refer to here?
Fair question. What I’m referring to is file backup and archive for anything up to enterprise level.

So specifically: photography archives, videos (including b-roll for content producers/videographers), project backups, personal files, important documents, etc. Up to and including anything that could be posted to r/datahoarders

Ah, PAR makes the most sense for archival material like that. What were you looking for in the PAR format that'd make more sense for this use case?
The impediment to adding more chips is the same as it is without ECC: more die space/heat/power. The reason they use a standard number of chips, I expect, is that it's easier to manage and GPUs don't care very much about weirder access sizes.
Im currently thinking of purchasing a Synology NAS that comes with BTRFS. Just wondering, do you happen to know if BTRFS also requires ECC RAM to function correctly?
Bit flips and data corruption affects all filesystems, and all systems stand to benefit from ECC, as shown by Linus' experience.
Depending on the model, most of the larger capacity synologys require ECC when expanding (1821+ for me recently)