Hacker News new | ask | show | jobs
Predicting the Order of Upcoming Tokens Improves Language Modeling (arxiv.org)
7 points by wavelander 293 days ago
1 comments

Are any of these methods doable on pre-trained models? Like freeze the model and only train these add-ons? Having to redo the training runs with these optimisations doesn't sound too practical, in the great scheme of things.
It's obviously practical for the next model you train from scratch. The point of research is obviously not to improve existing commercial products.