| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mrlongroots 284 days ago

> This was a comparison of two methods of moving data from the VFS to application memory. Depending on cache status this would run the whole gambit of mapping existing memory pages, kernel to userspace memory copies, and actual disk access.

Yes, I think maybe a reasonable statement is that a benchmark is supposed to isolate a meaningful effect. This benchmark was not set up correctly to isolate a meaningful effect IMO.

> Also, while we’re being annoyingly technical, a lot of server CPUs can DMA straight to the L3 cache so your proof of impossibility is not correct.

Interesting, didn't know that, thanks!

I think this does not invalidate the point though. You can temporarily stream directly to the L3 cache with DDIO, but as it fills out the cache will get flushed back to the main memory anyway and you will ultimately be memory-bound. I don't think there is some way to do some non-temporal magic here that circumvents main memory entirely.

1 comments

johncolanduoni 283 days ago

I agree, not a well designed benchmark. The mmap side did not use any of the mechanisms for paging in data before faulting each page, which are table stakes for this usecase.

The point of DDIO is that the data is frequently processed fast enough that it can get pulled from L3 to L1 and be finished with before it needs to be flushed to main memory. For a system with a coherent L3 or appropriate NUMA this isn’t really non-temporal, it’s more like the L3 cache is shared with the device.

link