| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by BenoitP 1968 days ago

> most of the compute time spent is in serializing and deserializing data.

This is to be viewed in light how hardware evolves now. CPU compute power is no longer growing as much (at least for individual cores).

But one thing that's still doubling on a regular basis is memory capacity of all kinds (RAM, SSD, etc) and bandwidth of all kinds (PCIe lanes, networking, etc). This divide is getting large and will only continue to increase.

Which brings me to my main point:

You can't be serializing/deserializing data on the CPU. What you want is to have the CPU coordinate the SSD to copy chunks directly -and as is- to the NIC/app/etc.

Short of having your RAM doing compute work*, you would be leaving performance on the table.

----

* Which is starting to appear (https://www.upmem.com/technology/), but that's not quite there yet.

3 comments

outworlder 1968 days ago

> What you want is to have the CPU coordinate the SSD to copy chunks directly -and as is- to the NIC/app/etc.

Isn't that what DMA is supposed to be?

Also, there's work in getting GPUs to load data straight from NVME drives, bypassing both the CPU and system memory. So you could certainly do similar things with the PCIE bus.

https://developer.nvidia.com/blog/gpudirect-storage/

A big problem is that a lot of data isn't laid out in a way that's ready to be stuffed in memory. When you see a game spending a long time loading data, that's usually why. The CPU will do a bunch of processing to map on disk data structures to a more efficient memory representation.

If you can improve the on-disk representation to more closely match what's in memory, then CPUs are generally more than fast enough to copy bytes around. They are definitely faster than system RAM.

link

jedbrown 1968 days ago

This is backward -- this sort of serialization is overwhelmingly bottlenecked on bandwidth (not CPU). (Multi-core) compute improvements have been outpacing bandwidth improvements for decades and have not stopped. Serialization is a bottleneck because compute is fast/cheap and bandwidth is precious. This is also reflected in the relative energy to move bytes being increasingly larger than the energy to do some arithmetic on those bytes.

link

jeffbee 1968 days ago

An interesting perspective on the future of computer architecture but it doesn't align well with my experience. CPUs are easier to build and although a lot of ink has been spilled about the end of Moore's Law, it remains the case that we are still on Moore's curve for number of transistors, and since about 15 years ago we are now also on the same slope for # of cores per CPU. We also still enjoy increasing single-thread performance, even if not at the rates of past innovation.

DRAM, by contrast, is currently stuck. We need materials science breakthroughs to get beyond the capacitor aspect ratio challenge. RAM is still cheap but as a systems architect you should get used to the idea that the amount of DRAM per core will fall in the future, by amounts that might surprise you.

link