Hacker News new | ask | show | jobs
by neilmovva 2535 days ago
Actually, the L3 cache is also sharded across chiplets, so there's a small (~8MB) local portion of L3 that is fast, while remote slices will have to go over AMD's interdie connection fabric and incur a serious latency penalty. On first gen Epyc/Threadripper, nonlocal L3 hits were almost as slow as DRAM at ~100ns (!).
1 comments

Does that local vs remote L3 show up in the NUMA information?