Hacker News new | ask | show | jobs
by macilacilove 910 days ago
The real invention is having a model that converges, does not lose the gradients, and having a framework to reliably train lots of parameters. Imo it is not historically accurate that next token prediction is the big invention. Some language models predict blanked tokens in the middle of sentences that are modern and interesting, and old language models like rnn-s are being used for quite some time, predicting next tokens one by one.