Hacker News new | ask | show | jobs
by thfuran 620 days ago
It was still used as part of pretraining the current model.
1 comments

Nonsense, the current model is a new architectural approach.

It was all explained in that recent paper, "Attention is all your meat"