Hacker News new | ask | show | jobs
by namaria 620 days ago
That was a much smaller model, couldn't do much more than crawl around and run away.
1 comments

It was still used as part of pretraining the current model.
Nonsense, the current model is a new architectural approach.

It was all explained in that recent paper, "Attention is all your meat"