Computation generally is partly moving data around, yes. What transformers do is learn how to move data around in a context-relevant manner. This greatly increases the expressivity of the kinds of computations they can perform over traditional deep nets.
It's not a misinterpretation. What attention does is discover association matrices which bind locations in the context window, and these associations are context sensitive. But binding locations through an association matrix is an implementation of the concept of routing, which is just moving data.
Also, the link I gave regarding induction heads is explicitly moving data in the context window forward.
I consider aggregate routing to be distinct from moving data. If the context is temporary then the "data" (weights and tokenizer) stays in place. LLMs are static, they do not move data so much as they infer from it.
This is a misrepresentation of how transformers behave and I think you should double-check the definition before dunking on other people's works.