How about having a large pool of unified memory and expanding the next layer (L3?) of cache to accommodate more of the CPU's the low-latency RAM usage?
As a rule, increasing the size of cache increases its latency, and how much of it you can use is capped by the quality of your cache management algorithms and the latency of the level above it.
Since CPUs are highly optimized, both increasing the latency of the main memory and increasing the size of L3 will probably lead to larger L3 latency.
And yes, a L4 cache can be one way out of that problem. Another way is making the L3 cache lines wider and working the hell out of improving your management algorithm.
It's not a theoretically impossible problem. It's also not something you can solve automatically with a bit more money or some simple decisions. It's possible this is the best architecture available, but it's not certain by any means.
Since CPUs are highly optimized, both increasing the latency of the main memory and increasing the size of L3 will probably lead to larger L3 latency.