Hacker News new | ask | show | jobs
by wtallis 934 days ago
PCIe latency is a few orders of magnitude lower than NAND flash read latency, so the extra round trip to the CPU's PCIe root complex doesn't matter.
1 comments

Not latency, but bandwidth to the GPU, matters for asset loading. Can the GPU load assets directly from its own SSD, as the PS5 does, or is this just an SSD the processor can use as a "disk"?
Extra hops on the PCIe link has even less of an impact on bandwidth than on latency.

This product is exposing the SSD directly to the host CPU because that's the only way to make the SSD useful. There's approximately zero software infrastructure for directly accessing an SSD from a GPU; nobody's running a NVMe driver and filesystem code entirely on the GPU, and even if you did that would be a non-starter in the consumer space because it would effectively require reserving the entire SSD for use by a single application.

It is possible on Linux to have GPU code issue storage requests over io_uring (if the kernel is polling that queue; the GPU cannot directly issue a syscall to make the kernel start checking for new IO requests). But that request still is handled on the CPU as it passes through the OS filesystem/storage stack before the NVMe SSD is instructed to DMA the requested data directly to/from the GPU's VRAM.

Microsoft's DirectStorage is (among other things) their effort to enable similar functionality for at least some use cases.

That’s unlikely given that there’s effectively no difference in framerates between x8 and x16 slots, even though the bandwidth doubles. Bandwidth is pretty clearly not the bottleneck.
PCIe bandwidth is not typically a bottleneck, it has kept ahead of what people need. 8 lanes of PCIe is already 16 GB/s