Hacker News new | ask | show | jobs
by ixmerof 934 days ago
I am not a pro in the subject, but does it mean the data still needs to reach the bridge on cpu first? What I am thinking of now is that with DX12 there's already a direct access to SSD's for fastes data copy to NVRAM, why then couldn't this benefit of losing some hops whenever reaching for data that is already onboard?
4 comments

PCIe latency is a few orders of magnitude lower than NAND flash read latency, so the extra round trip to the CPU's PCIe root complex doesn't matter.
Not latency, but bandwidth to the GPU, matters for asset loading. Can the GPU load assets directly from its own SSD, as the PS5 does, or is this just an SSD the processor can use as a "disk"?
Extra hops on the PCIe link has even less of an impact on bandwidth than on latency.

This product is exposing the SSD directly to the host CPU because that's the only way to make the SSD useful. There's approximately zero software infrastructure for directly accessing an SSD from a GPU; nobody's running a NVMe driver and filesystem code entirely on the GPU, and even if you did that would be a non-starter in the consumer space because it would effectively require reserving the entire SSD for use by a single application.

It is possible on Linux to have GPU code issue storage requests over io_uring (if the kernel is polling that queue; the GPU cannot directly issue a syscall to make the kernel start checking for new IO requests). But that request still is handled on the CPU as it passes through the OS filesystem/storage stack before the NVMe SSD is instructed to DMA the requested data directly to/from the GPU's VRAM.

Microsoft's DirectStorage is (among other things) their effort to enable similar functionality for at least some use cases.

That’s unlikely given that there’s effectively no difference in framerates between x8 and x16 slots, even though the bandwidth doubles. Bandwidth is pretty clearly not the bottleneck.
PCIe bandwidth is not typically a bottleneck, it has kept ahead of what people need. 8 lanes of PCIe is already 16 GB/s
surely that data copy would not be on the fly, it would be preloaded, SSD's ~8GB/s could not keep up with GPU's ~350GB/s
Yes, it does, although DMA means the CPU will not need to process it. It's like any other M2 drive in your PC.
nVidia generally disables/does not use ReBAR as it either shows no improvement or degrades performance. AMD may be a different story, but that isn't what this card is.
That’s very incorrect. Nvidia requires that every piece of firmware along the way be updated to support it, and won’t enable it unless they are. However if they are, depending on the game there’s definitely uplift.