| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dontwearitout 855 days ago
	The transformer module is currently dominating ML, and is widely used in text, vision, audio, and video models. It was introduced in 2017 and shows no real signs of being displaced. It has no convolutions. https://en.wikipedia.org/wiki/Transformer_(deep_learning_arc... http://jalammar.github.io/illustrated-transformer/

2 comments

CamperBob2 855 days ago

If they use dot products on at least one layer with fully-connected inputs, which they do, along with everything else derived from the basic MLP model, then they're technically performing convolution.

Of course, the convolution concept breaks down when nonlinear activation functions are introduced, so I'm not sure the equivalence is really all that profound.

link

dontwearitout 853 days ago

I don't think a dot product between high dimensional vectors is considered a convolution? I'm familiar with convolution between continuous functions, and with kernels in neural networks providing invariance. I'd love to learn more if you have any links that expand on your statement.

link

adamnemecek 854 days ago

Nonlinear activation layers are piecewise linear-ish.

link

adamnemecek 855 days ago

… or does it https://grlearning.github.io/papers/11.pdf

link

dontwearitout 853 days ago

Lots of interesting work out there; time will tell!

link