We're already half-way in a heterogeneous future, with chiplets[1] and mixed cores[2][3] etc. Could we expand this to memory, having some soldered (on-chip?) high-speed memory, and then slots for additional slower, yet faster then the alternatives, DIMMs?
Or would the cost of the extra complexity of the memory controller likely not be worth it ever?
> Could we expand this to memory, having some soldered (on-chip?) high-speed memory, and then slots for additional slower, yet faster then the alternatives, DIMMs?
Intel's already doing that with Xeon Max, it has both onboard HBM and an outboard DDR5 interface. It can be configured to run entirely from HBM with no DDR5 installed at all, or use the HBM as a huge cache in front of the DDR5, or to map the HBM and DDR5 into different memory regions to let software decide how to use each. I don't think there's been any indication of that approach filtering down to consumer architectures though, Intel is talking about doing RAM-on-package there but without any outboard memory interface alongside it.
Obviously high-end consumer CPUs already have about 30MB of on-chip memory, with server CPUs reaching a solid 300MB. We just prefer to call it L2 and L3 cache. If we add more memory in a chiplet format I suspect mainstream CPUs would simply expose (or rather hide) it as L3 or L4 cache.
Most software isn't even NUMA aware, and would completely fail to take advantage of a tiered memory hierarchy if it was given the option. But if we make the fast memory a big cache and let the CPU worry about it it's a "cheap" win.
Though there is the Xeon Phi which has about 16GB of on-package memory that can either be configured as cache or as "scratchpad" memory. But of course that's not meant for general-purpose software
Intel's already doing that with Xeon Max, it has both onboard HBM and an outboard DDR5 interface. It can be configured to run entirely from HBM with no DDR5 installed at all, or use the HBM as a huge cache in front of the DDR5, or to map the HBM and DDR5 into different memory regions to let software decide how to use each. I don't think there's been any indication of that approach filtering down to consumer architectures though, Intel is talking about doing RAM-on-package there but without any outboard memory interface alongside it.