The transformer module is currently dominating ML, and is widely used in text, vision, audio, and video models. It was introduced in 2017 and shows no real signs of being displaced. It has no convolutions.
If they use dot products on at least one layer with fully-connected inputs, which they do, along with everything else derived from the basic MLP model, then they're technically performing convolution.
Of course, the convolution concept breaks down when nonlinear activation functions are introduced, so I'm not sure the equivalence is really all that profound.
I don't think a dot product between high dimensional vectors is considered a convolution? I'm familiar with convolution between continuous functions, and with kernels in neural networks providing invariance. I'd love to learn more if you have any links that expand on your statement.
Of course, the convolution concept breaks down when nonlinear activation functions are introduced, so I'm not sure the equivalence is really all that profound.