Predicting the Order of Upcoming Tokens Improves Language Modeling

Y	Hacker News new \| ask \| show \| jobs

	Predicting the Order of Upcoming Tokens Improves Language Modeling (arxiv.org)
	7 points by wavelander 293 days ago

1 comments

NitpickLawyer 293 days ago

Are any of these methods doable on pre-trained models? Like freeze the model and only train these add-ons? Having to redo the training runs with these optimisations doesn't sound too practical, in the great scheme of things.

link

impossiblefork 292 days ago

It's obviously practical for the next model you train from scratch. The point of research is obviously not to improve existing commercial products.

link