| The Unified Memory pool is what will continue to be the “game changer” in systems architecture, especially outside of data centers. The reality is even cutting edge games and consumer workloads don’t actually take full use of the PCIe bandwidth of the GPU or the bandwidth of its GDDR memory. Even local AI use cases don’t substantially or meaningfully benefit from faster memory, at least to average consumers. A unified memory pool does two things: 1) Lets systems optimize utilization based on need, rather than be confined to specific pools 2) Reduce overall memory cost, by letting system builders purchase a single type of memory in bulk instead of having to figure out GDDR vs DDR memory placement (important for SFF/portable machines) So at a time when memory is expensive, unified pools make more sense. Even when memory becomes cheap and plentiful again, it’s just practical at this point to allocate a larger overall pool instead of managing discrete sets. The one big drawback is security. A shared memory pool means side-channel attacks against memory from the GPU or CPU could potentially compromise the other as well, meaning memory-safe designs are going to be critical to security going forward (which is good for Rust adherents, I figure). |
The trouble with this is that the different types of memory have different characteristics. Latency for ordinary system memory is actually better than it is for GDDR, because GDDR is optimized for bandwidth. RTX 5090 has 1.8TB/s of memory bandwidth with a 512-bit memory bus. The same bus width for DDR5-9600 would have better latency but only a third of the bandwidth.
CPU workloads are generally bounded by latency and GPU workloads are generally bounded by bandwidth, which is why they use two different types.
> Reduce overall memory cost, by letting system builders purchase a single type of memory in bulk instead of having to figure out GDDR vs DDR memory placement (important for SFF/portable machines)
The trouble with this is cost. In principle you could get the same 1.8TB/s of memory bandwidth as the RTX 5090 has, with the better latency of DDR5, by using DDR5 with a 1536-bit bus. This is indeed with multi-socket servers do, two sockets with 768-bit in memory channels per socket, but now check how much those system boards cost.
But the remaining alternatives are both worse. If you use GDDR for the unified memory then GDDR costs more than DDR and you're going to have significantly worse latency for the CPU. If you use DDR without a 3-4 times wider bus than the already-wide GPU then the GPU gets starved for bandwidth.