ST is exactly the same sort of thing junke was talking about, except it also uses the type system to ensure the imperative core doesn't leak into the outside world.
I think the grandparent comment boils down to saying "there's no known persistent data structure with O(1) random access for read and write". Whether you need such a structure (for a cache, histogram, frame buffer, etc.) is up to you.
Some functional languages allow transient data structures (via something like ST or uniqueness types), so you can match the performance of any imperative algorithm at the cost of ugly code.
> Note also that all of this discusses only asymptotic running times. Many techniques for implementing purely functional data structures give you a certain amount of constant factor slowdown, due to extra bookkeeping necessary for them to work, and implementation details of the language in question. The benefits of purely functional data structures may outweigh these constant factor slowdowns, so you will generally need to make trade-offs based on the problem in question.
Hash tables are a big example. You can implement pure associative arrays, and they have some nice benefits, but the best functional implementations still have much poorer asymptotic performance than a basic hash table.
Pure data structures also tend to include a lot of extra pointers. That can create a lot of overhead. A pure list of 32-bit integers will need either 4 or 8 bytes of overhead (depending on whether you're running 32 or 64 bit) for every item stored. A resizable array just needs whatever empty space it preallocates, plus the occasional memcpy when it needs to add capacity. Also locality of reference and all that fun stuff.
Another example: Caching of results, transparent to the user of your function. If your function is otherwise pure you can simply use its parameter and hash them for a key to your result index. Examples where I've used this: Caching of regex results, replacing file reads with modification date polls and read from cache if not changed..
I was disappointed to learn that Quicksort implemented functionally is almost always much much slower than if implemented procedurally, to the point that functional langs use other sorts such as mergesort. What makes it extra annoying is that Quicksort implemented functionally is so damn elegant!
Neural networks are a good example where mutable structures are the only realistic choice. You're dealing with fully connected networks of millions of nodes, each of which needs to be updated multiple times for each layer at every pass.
You can do merge sort in Haskell asymptotically as well as C, but not quicksort (because you can't mutate things in place).
I am of course omitting things like ST which do give you this sort of ability in Haskell, but I doubt that's what the OP meant by "purely functional".