| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by dspillett 2404 days ago

Just a guess here, but I'm thinking it has something to do with the registers being an internal core component of the CPU that have been there in some form the whole time whereas cache was a little more of an afterthought when CPUs and other processing units stopped being the bottleneck and started to significantly outstrip the performance of (practical/affordable) memory and memory controllers. It used to be that the memory sub-systems had artificial wait states as the processors could not keep up otherwise, rather than processors siting around waiting for responses.

Also, cache is essentially optional, and its configuration (not just sizes, but how it is shared amongst cores and other units, speeds relative to other memory levels, even how many levels there are (and each level can have different properties) how cache rows are arranged and mapped, their size, ..., etc.) can and will vary between otherwise identical looking systems. If you are compiling to optimise for cache use you end up either having to JiT compile, or compile several versions of some routines and include them all so which one is used can be chosen at run-time, or have different versions of the whole compiled output for different systems. All of those things happen at times anyway for other reasons, but presumably the overall pay-off of doing the same for cache variances isn't high enough for it to be worthwhile building into general purpose compilers (though the cases for/against this sort of work in domain specific compilers and other tool chains may be quite different).

Some designs argue that we shouldn't need to care about the implementation details of any memory let alone L1/2/3/? cache - just access storage and let the OS & hardware make use of the faster memory levels it has access to as it sees fit to optimise that storage access.