| > most of the compute time spent is in serializing and deserializing data. This is to be viewed in light how hardware evolves now. CPU compute power is no longer growing as much (at least for individual cores). But one thing that's still doubling on a regular basis is memory capacity of all kinds (RAM, SSD, etc) and bandwidth of all kinds (PCIe lanes, networking, etc). This divide is getting large and will only continue to increase. Which brings me to my main point: You can't be serializing/deserializing data on the CPU. What you want is to have the CPU coordinate the SSD to copy chunks directly -and as is- to the NIC/app/etc. Short of having your RAM doing compute work*, you would be leaving performance on the table. ---- * Which is starting to appear (https://www.upmem.com/technology/), but that's not quite there yet. |
Isn't that what DMA is supposed to be?
Also, there's work in getting GPUs to load data straight from NVME drives, bypassing both the CPU and system memory. So you could certainly do similar things with the PCIE bus.
https://developer.nvidia.com/blog/gpudirect-storage/
A big problem is that a lot of data isn't laid out in a way that's ready to be stuffed in memory. When you see a game spending a long time loading data, that's usually why. The CPU will do a bunch of processing to map on disk data structures to a more efficient memory representation.
If you can improve the on-disk representation to more closely match what's in memory, then CPUs are generally more than fast enough to copy bytes around. They are definitely faster than system RAM.