Convolution is part of the network design though.
Would a fully connected network learn to convolute? Or would it turn out that convolution is not necessary?
The interesting part here isn't the convolution itself, it's how convolutional layers turn out to like "filters" or "detectors" for individual features. This is explained very well in the distill.pub article linked by GP.
We know the architecture of LLMs because we created it, but we don't yet have the same level of understanding about them, or the same quality of analytical tools for reasoning about them.
They do and in fact it's relatively straightforward to show empirically on eg MNIST. The problem is that you need a much much larger network in the FCN case and thus need way more data and way more data augmentation to get a good result that isn't overfit to hell.
In the case of CNN the reason it works is that an image of an object X is still an image of object X if the X is shifted left or right. The property is translationally invariant. CNN are basically the simplest way to encode translational invariance.
> CNN are basically the simplest way to encode translational invariance
That's the geometric deep learning theory, isn't it? Do you know if there's a list somewhere of exactly what invariance has which ways to simulate it? Like an overview?
The point of using a CNN instead of a FCN is that you force it to learn in a certain way that prevents overfitting. But given a sufficient dataset, and proper data augmentation you would expect a FCN to be able to identify objects regardless of translation. It's just that a CNN would train easier and better, with a smaller network (a FCN doing convolutions would be very wasteful).
That's why traditionally you would pick your architecture to help it learn in a certain way (images=cnn, text=rnn/lstm/gru). But the nice thing about transformers is that they are more general.
We know the architecture of LLMs because we created it, but we don't yet have the same level of understanding about them, or the same quality of analytical tools for reasoning about them.