Hacker News new | ask | show | jobs
by dragontamer 1537 days ago
I'm no expert on PCIe, but its been described to me as a network.

PCIe has switches, addresses, and so forth. Very much like IP-addresses, except PCIe operates on a significantly faster level.

At its lowest-level, PCIe x1 is a single "lane", a singular stream of zeros-and-ones (with various framing / error correction on top). PCIe x2, x4, x8, and x16 are simply 2x, 4x, 8x, or 16 lanes running in parallel and independently.

-------

PCIe is a very large and complex protocol however. This "serial" comms can become abstracted into Memory-mapped I/O. Instead of programming at the "packet" level, most PCIe operations are seen as just RAM.

> even virtual memory

So you understand virtual memory? PCIe abstractions go up to and include the virtual memory system. When your OS sets aside some virtual-memory for PCIe devices, when programs read/write to those memory-addresses, the OS (and PCIe bridge) will translate those RAM reads/writes into PCIe messages.

--------

I now handwave a few details and note: GPUs do the same thing on their end. GPUs can also have a "virtual memory" that they read/write to, and translates into PCIe messages.

This leads to a system called "Shared Virtual Memory" which has become very popular in a lot of GPGPU programming circles. When the CPU (or GPU) read/write to a memory address, it is then automatically copied over to the other device as needed. Caching layers are layered on top to improve the efficiency (Some SVM may exist on the CPU-side, so the GPU will fetch the data and store it in its own local memory / caches, but always rely upon the CPU as the "main owner" of the data. The reverse, GPU-side shared memory, also exists, where the CPU will communicate with the GPU).

To coordinate access to RAM properly, the entire set of atomic operations + memory barriers have been added to PCIe 3.0+. So you can perform "compare-and-swap" to shared virtual memory, and read/write to these virtual memory locations in a standardized way across all PCIe devices.

PCIe 4.0 and PCIe 5.0 are adding more and more features, making PCIe feel more-and-more like a "shared memory system", akin to cache-coherence strategies that multi-CPU / multi-socket CPUs use to share RAM with each other. In the long term, I expect Future PCIe standards to push the interface even further in this "like a dual-CPU-socket" memory-sharing paradigm.

This is great because you can have 2-CPUs + 4 GPUs on one system, and when GPU#2 writes to Address#0xF1235122, the shared-virtual-memory system automatically translates that to its "physical" location (wherever it is), and the lower-level protocols pass the data to the correct location without any assistance from the programmer.

This means that a GPU can do things like perform a linked-list traversal (or tree traversal), even if all of the nodes of the tree/list are in CPU#1, CPU#2, GPU#4, and GPU#1. The shared-virtual-memory paradigm just handwaves the details and lets PCIe 3.0 / 4.0 / 5.0 protocols handle the details automatically.

2 comments

I agree that PCIe is mostly shared memory system.

But for videocards this sharing is unequal, because their RAM sizes exceeds 32bit address space, and lot of still used mainboards have 32bit PCIe controller, so all PCIe addresses should be inside 4GB address space, and this is seen on windows machines as total installed memory is nor all, but minus approximately 0.5GB, from which 256MB is videoram access window.

So in most cases, remain in force rule, that videocard share all it's memory through 256mb window using bank-switching.

As for GPU read main system memory, usually this is useless, because vram is magnitudes faster, even if not consider usage of bus bandwidth by other devices, like HDD/SSD.

And in most cases, only usage of access GPU to main system memory, is traditional read of textures (for 3D accelerator) from main system memory - for example ALL 3D software using GPU rendering, could only use for this videoram, none use system ram.

Yeah, as I've read other responses to my post I've been able to better define my difficulties in understanding CPU-GPU communication. I was having a hard time separating the MMIO concept from the communications protocol that ties together all of these devices (based on what you've explained that'd be PCIe). I actually haven't learned about PCIe as of yet, so the way you've introduced the concept has set me up to further look into it, thanks.