Hacker News new | ask | show | jobs
by pests 399 days ago
One distinction is the original transformer was an encoder/decoder while (most?) LLMs today are encoder only.

The translation transformer also was able to peek ahead in the context window while (most?) LLM's now only consider previous tokens.

1 comments

They're usually thought as "decoder only"
Oops yes thank you, was late when I replied.