|
|
|
|
|
by swyx
1057 days ago
|
|
> Transformers can be generally categorized into one of three categories: “encoder only” (a la BERT); “decoder only” (a la GPT); and having an “encoder-decoder” architecture (a la T5). Although all of these architectures can be rigged for a broad range of tasks (e.g. classification, translation, etc), encoders are thought to be useful for tasks where the entire sequence needs to be understood (such as sentiment classification), whereas decoders are thought to be useful for tasks where text needs to be completed (such as completing a sentence). Encoder-decoder architectures can be applied to a variety of problems, but are most famously associated with language translation. theres a whole lot of "thought to be"'s here. is there a proper study done on the relative effectiveness of encoder only vs decoder only vs encoder-decoder for various tasks? |
|
Not much on empirical observations, though.
[1]https://arxiv.org/abs/2207.09238