Hacker News new | ask | show | jobs
by oivaksef 2345 days ago
Why don't we have two sets of memory - the main, slow, cached memory, and a smaller set of super fast low latency memory, like cache, but that the programmer and compiler can use explicitly?

Sort of like special purpose performance instructions like SIMD but on the memory side

3 comments

Special instructions won't help latency. There are prefetch instructions that basically just do loads. Each load in x64 (at least on Intel) grabs 128 bytes (two cache lines) from memory to cache minimum.

The Cell from Sony did have explicit cache control (I think) and it was notoriously difficult to program for.

The real reason cache isn't handled explicitly though is because it isn't necessary. Getting good performance and cache usage can be done at the C++ level, you just have to know how the CPU works and access memory linearly so it can be prefetched. I've tried to use prefetch instructions and beating the out of order buffer in the CPU is actually very difficult.

It's a common misconception that cache is "low latency". It certainly is compared to main memory (which can take ~200 cycles before it stats feeding the CPU the bytes it wanted), but L3 hit can take 40-60 cycles easily as well, so it's not even an order of magnitude difference. By the time you're hitting L3, you're kind of already screwed.

For perf, I'd much rather have larger L1, or a (much) bigger register file.

Some processors allow for caches to be configures as scratchpad memory which is what you're describing.