Hacker News new | ask | show | jobs
by vagab0nd 2401 days ago
This might be a naive question: how did we decide as an industry that cache should be controlled by the hardware, but registers and main memory by the compiler?
3 comments

Just a guess here, but I'm thinking it has something to do with the registers being an internal core component of the CPU that have been there in some form the whole time whereas cache was a little more of an afterthought when CPUs and other processing units stopped being the bottleneck and started to significantly outstrip the performance of (practical/affordable) memory and memory controllers. It used to be that the memory sub-systems had artificial wait states as the processors could not keep up otherwise, rather than processors siting around waiting for responses.

Also, cache is essentially optional, and its configuration (not just sizes, but how it is shared amongst cores and other units, speeds relative to other memory levels, even how many levels there are (and each level can have different properties) how cache rows are arranged and mapped, their size, ..., etc.) can and will vary between otherwise identical looking systems. If you are compiling to optimise for cache use you end up either having to JiT compile, or compile several versions of some routines and include them all so which one is used can be chosen at run-time, or have different versions of the whole compiled output for different systems. All of those things happen at times anyway for other reasons, but presumably the overall pay-off of doing the same for cache variances isn't high enough for it to be worthwhile building into general purpose compilers (though the cases for/against this sort of work in domain specific compilers and other tool chains may be quite different).

Some designs argue that we shouldn't need to care about the implementation details of any memory let alone L1/2/3/? cache - just access storage and let the OS & hardware make use of the faster memory levels it has access to as it sees fit to optimise that storage access.

The wonderful thing about caches is that they are invisible to the programmer; the designer can add bigger caches, more levels of caches, different configurations of shared/private/inclusive/exclusive caches... and it Just Works.

However, there are many machines, which instead of using caches, use programmer-visible “scratchpads”. There are many reasons for this, but two big reasons are performance-predictability (real-time systems) or to avoid the hw complexity of cache management.

In general though, scratchpads are hugely painful to program for and terrible for code portability.

What do you mean by 'controlled by hardware'? The registers the compiler chooses for you are an abstraction themselves, one of the first things a CPU does is register renaming. Same for main memory, you are presented mostly an abstraction of main memory, with a ton of layers in-between (load/store buffers, write-combine buffers, coherent caches, etc)

Turning the argument the other way, you as a programmer can control a lot about caches: you can prefetch cache lines, invalidate them and use streaming instructions to bypass them.

What you said makes sense. The question is more from the perspective of the compiler/assembler. At these layers you explicitly say, move this number to this register, or read a number from this address. But you very rarely are able to say, copy this chunk of data into this cache line. Sure there are exceptions (like the GPU), and there are cases where you can hack it to do what you want. But in general you don't get to control the cache in a very specific way.