| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mungoman2 1215 days ago
	Convolution is part of the network design though. Would a fully connected network learn to convolute? Or would it turn out that convolution is not necessary?

3 comments

nerdponx 1215 days ago

The interesting part here isn't the convolution itself, it's how convolutional layers turn out to like "filters" or "detectors" for individual features. This is explained very well in the distill.pub article linked by GP.

We know the architecture of LLMs because we created it, but we don't yet have the same level of understanding about them, or the same quality of analytical tools for reasoning about them.

link

xmcqdpt2 1214 days ago

They do and in fact it's relatively straightforward to show empirically on eg MNIST. The problem is that you need a much much larger network in the FCN case and thus need way more data and way more data augmentation to get a good result that isn't overfit to hell.

In the case of CNN the reason it works is that an image of an object X is still an image of object X if the X is shifted left or right. The property is translationally invariant. CNN are basically the simplest way to encode translational invariance.

link

candiodari 1214 days ago

> CNN are basically the simplest way to encode translational invariance

That's the geometric deep learning theory, isn't it? Do you know if there's a list somewhere of exactly what invariance has which ways to simulate it? Like an overview?

link

redox99 1215 days ago

Yes it would, or at least a similar operation.

The point of using a CNN instead of a FCN is that you force it to learn in a certain way that prevents overfitting. But given a sufficient dataset, and proper data augmentation you would expect a FCN to be able to identify objects regardless of translation. It's just that a CNN would train easier and better, with a smaller network (a FCN doing convolutions would be very wasteful).

That's why traditionally you would pick your architecture to help it learn in a certain way (images=cnn, text=rnn/lstm/gru). But the nice thing about transformers is that they are more general.

link