It's easy to buy memory, but hard to buy L2/L3 cache. The whole point of the exercise is to scale more easily on multicore architectures, but it's no good if you blow out the cache thousands of times per second and bottleneck the system on memory accesses.
Additionally, DRAM and VRAM bandwidth are always at premium. Whenever you're making a copy (which you do a lot when objects are immutable), you use memory bandwidth.
This is especially important on mobile, FPGAs and in cases where sheer volume of data is huge. (GPU and big data)