Y
Hacker News
new
|
ask
|
show
|
jobs
by
namaria
620 days ago
That was a much smaller model, couldn't do much more than crawl around and run away.
1 comments
thfuran
620 days ago
It was still used as part of pretraining the current model.
link
namaria
620 days ago
Nonsense, the current model is a new architectural approach.
It was all explained in that recent paper, "Attention is all your meat"
link