| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by swyx 1057 days ago
	> Transformers can be generally categorized into one of three categories: “encoder only” (a la BERT); “decoder only” (a la GPT); and having an “encoder-decoder” architecture (a la T5). Although all of these architectures can be rigged for a broad range of tasks (e.g. classification, translation, etc), encoders are thought to be useful for tasks where the entire sequence needs to be understood (such as sentiment classification), whereas decoders are thought to be useful for tasks where text needs to be completed (such as completing a sentence). Encoder-decoder architectures can be applied to a variety of problems, but are most famously associated with language translation. theres a whole lot of "thought to be"'s here. is there a proper study done on the relative effectiveness of encoder only vs decoder only vs encoder-decoder for various tasks?

2 comments

dsubburam 1057 days ago

'Formal Algorithms for Transformers'[1] is a proper account of the architectures and what tasks they naturally lend themselves to, by authors from DeepMind. See sections 3 (Transformers and Typical Tasks) and 6 (Transformer Architectures).

Not much on empirical observations, though.

[1]https://arxiv.org/abs/2207.09238

link

swyx 1057 days ago

ty!

link

pseudonom- 1057 days ago

There's some discussion in section 3.2 of https://arxiv.org/pdf/1910.10683.pdf

link

swyx 1057 days ago

ty!

link