Hacker News new | ask | show | jobs
by imtringued 958 days ago
I don't know where you got that idea from. There is a movement in the complete opposite direction with CXL. Don't waste your time with silly libraries, serialisation or networking. Have a rack that is filled with nothing but memory pooled RAM and then connect your servers (which still retain RAM as a L4 cache). You now have a huge shared memory machine with distributed CPUs using CXL for cache coherence accross the entire system. There have been benchmarks that kept 75% of the memory outside the server and the performance degradation was only 10% compared to keeping the entire data set on a single server.
1 comments

> There have been benchmarks that kept 75% of the memory outside the server and the performance degradation was only 10% compared to keeping the entire data set on a single server.

Performance degradation would greatly depend on how much data was actually touched by the workload outside the server and not solely by the fact that 75% of the memory was attached through CXL, no?

NUMA latency I measured last time on a dual-socket Xeon (Haswell) system was around 130ns for non-local memory access and 90ns for local memory access. OTOH some numbers I found seem to imply that the CXL latency is ~200ns.

This means that on average CXL latency is almost 100% larger than NUMA so I think it is not realistic to have only 10% performance degradation unless most of your workload fits into L1/L2/L3 cache plus that 25% of local memory or your workload is more CPU bound rather than memory bound.