| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vore 1232 days ago
	That's not what a transformer model is: a transformer model is just one that uses self-attention blocks in its layers to encode contextual information about the input. A non-transformer model can equally translate from one representation to another: e.g. before transformer models a commonly used architecture for seq2seq models were RNNs.