Of course it can be pure, so long as the shared data is not mutated. Functional languages such as Haskell don't copy values all over the place when calling functions, they pass in pointers.
But that's an optimisation. Which can be switched off to get the benefits from parallelism. This is a tradeoff that programmers would traditionally have to consider, but with functional programming the compiler can figure it out for us.
"It's not decidable" is not a guarantee that humans can do better than a compiler. It means any approach will sometimes be wrong, but a compiler might well (in principle) be wrong less often than a human.
This is separate from the "a human has a lot of additional context that's hard to express" argument, which is valid but has nothing to do with decidability.