When I saw the example in the paper of building a list by copying the entire list for every new entry and trashing the old one I knew performance would be a problem. A smart compiler can optimize that if it knows what you are doing, but they were writing a research compiler.
There is a tradeoff with this kind of immutable oriented programming that you trade off relatively easy parallelism for heinous memory thrashing--and cache misses are already a huge bottleneck on modern architectures. Rust seems to have struck a good balance here, I'm keeping a close eye on it as it continues to mature.
Another thing to keep in mind is that with everything being immutable you end up with a pure language. Once that happens, the compiler starts being able to perform some really nice rewrites of your code really easily. The problem is that this is all or nothing - unless your language is actually pure you can't do this.
It's easy to buy memory, but hard to buy L2/L3 cache. The whole point of the exercise is to scale more easily on multicore architectures, but it's no good if you blow out the cache thousands of times per second and bottleneck the system on memory accesses.
Additionally, DRAM and VRAM bandwidth are always at premium. Whenever you're making a copy (which you do a lot when objects are immutable), you use memory bandwidth.
This is especially important on mobile, FPGAs and in cases where sheer volume of data is huge. (GPU and big data)
There is a tradeoff with this kind of immutable oriented programming that you trade off relatively easy parallelism for heinous memory thrashing--and cache misses are already a huge bottleneck on modern architectures. Rust seems to have struck a good balance here, I'm keeping a close eye on it as it continues to mature.