Y
Hacker News
new
|
ask
|
show
|
jobs
by
thfuran
620 days ago
It was still used as part of pretraining the current model.
1 comments
namaria
620 days ago
Nonsense, the current model is a new architectural approach.
It was all explained in that recent paper, "Attention is all your meat"
link
It was all explained in that recent paper, "Attention is all your meat"