| But "real memory" is neither what C presents or the copy semantics that is used in FP. The CPU will keep memory in 64-byte cache lines. There is a complex bus protocol to shuffle cache lines and subparts of cache lines to main memory. There are additional complex protocols for cache coherence. The cost of reading 64 bytes from memory into a cache line and when doing a write-back, storing it at a different location in main memory is zero. Memory is always being copied into our L1 and L2 cache. Copying data eliminates most of the cache coherence protocols that are complex and costly. Yes we get a lot of this for free in the CPU implementation, but there is a lot of complexity that goes into imposing what is really beginning to be an unnatural model (mutable memory) on a hierarchical memory system. FP is using a log-based model. You write to fresh memory, no aliasing, no coherence, no conflicts. You then, during GC, remap memory. Current hardware cant "remap memory" efficiently, but it seems like the FP approach, in one form or another, is the better approach for dealing with high scalability and deep memory hierarchies: write to fresh memory, then expose a remap operation at the hardware level. SSD disks are a bit like that internally. |
I am not sure what the cost is of the cache coherency hardware, but if it were high, then presumably single-core CPUs would have a big advantage on single-threaded workloads. That doesn't seem to be the case.