Hacker News new | ask | show | jobs
by throwaway77385 119 days ago
> spinning disks have been replaced by NVMe solid state drives with near-RAM I/O bandwidth

Am I missing something here? Even Optane is an order of magnitude slower than RAM.

Yes, under ideal conditions, SSDs can have very fast linear reads, but IOPS / latency have barely improved in recent years. And that's what really makes a difference.

Of course, compared to spinning disks, they are much faster, but the comparison to RAM seems wrong.

In fact, for applications like AI, even using system RAM is often considered too slow, simply because of the distance to the GPU, so VRAM needs to be used. That's how latency-sensitive some applications have become.

2 comments

>for applications like AI, even using system RAM is often considered too slow, simply because of the distance to the GPU

That's not why. It's because RAM has a narrower bus than VRAM. If it was a matter of distance it'd just have greater latency, but that would still give you tons of bandwidth to play with.

You could be charitable and say the bus is narrow because it has to travel a long distance and this makes it hard to have a lot of traces.
It's not. It's narrow even between the CPU and RAM. That's just the way x86 is designed. Nvidia and AMD by contrast have the luxury of being able to rearchitect their single-board computers each generation as long as they honor the PCIe interface.

It is also true that having a 384-bit memory bus shared with the video card would necessitate a redesigned PCIe slot as well as an outrageous number of traces on the motherboard, though.

Traditionally, the width of the GPU memory interfaces was many times greater than that of CPUs.

However the maximum width in consumer GPUs, of up to 1024-bit, has been reached many years ago.

Since then the width of the memory interfaces in consumer GPUs has been decreasing continuously, and this decrease has been only partially compensated by higher memory clock frequencies. This reduction has been driven by NVIDIA, in order to increase their profit margins by reducing the memory cost.

Nowadays, most GPU owners must be content with a memory interface no better than 192-bit, like in RTX 5070, which is only 50% wider than for a desktop CPU and much narrower than for a workstation or server CPU.

The reason why using the main memory in GPUs is slow has nothing to do with the width of the CPU memory interface, but it is caused by the fact that the GPU accesses the main memory through PCIe, so it is limited by the throughput of at most 16 PCIe lanes, which is much lower than that of either the GPU memory interface or the CPU memory interface.

ThreadRipper has 8 memory channels versus 2 for a desktop AMD CPU. It's not an x86 limitation.
"x86" as in the computer architecture, not the ISA. Why do you think they put extra channels instead of just having a single 512-bit bus?
The memory interface of CPUs is made wider by adding more channels because there are no memory modules with a 512-bit interface. Thus you must add multiples of the module width to the CPU memory interface.

This has nothing to do with x86, but it is determined by the JEDEC standards for DRAM packages and DRAM modules. The ARM server CPUs use the same number of memory channels, because they must use the same memory modules.

A standard DDR5 memory module has a width of the memory interface that is of 64-bit or 72-bit or 80-bit, depending on how many extra bits may be available for ECC. The interface of a module is partitioned in 2 channels, to allow concurrent accesses at different memory addresses. Despite the fact that the current memory channels have a width of 32-bit/36-bit/40-bit, few people are aware of this, so by "memory channel" most people mean 64 bits (or 72-bit for ECC), because that was the width of the memory channel in older memory generations.

Not counting ECC bits, most desktop and laptop CPUs have an 128-bit memory interface, some cheaper server and workstation CPUs have a 256-bit memory interface, many server CPUs and some workstation CPUs have a 512-bit memory interface, while the state-of-the-art server CPUs have a 768-bit memory interface.

For comparison, RTX 5070 has a 192-bit memory interface, RTX 5080 has a 256-bit memory interface and RTX 5090 has a 512-bit memory interface. However, the GDDR7 memory has a transfer rate that is 4 to 5 times higher than DDR5, which makes the GPU interfaces faster, despite their similar or even lower widths.

I can't edit my comment, but to the people responding here, thank you for adding all this information. It really helped elucidate why VRAM vs RAM is a distinction and also prevents my somewhat naive interpretation from being the only thing people see. Thanks!