Y
Hacker News
new
|
ask
|
show
|
jobs
by
PartiallyTyped
942 days ago
The most interesting thing in this whole saga is that decoder only models (aka causal transformers like GPT) are as effective as they are.