| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cocogoatmain 196 days ago
	Want to also add that the model doesn’t know how to respond in a user-> assistant style conversation after it’s pretraining, and it’s a pure text predictor (look at the open source base models) There’s also what is being called mid-training where the model is trained on high(er) quality traces and acts as a bridge between pre and post training

1 comments

amypetrik8 196 days ago

just to go off of this there is also stochastic random overfit retraining process (SRORP). Idea behind SRORP is to avoid overfitting. SRORP will take data points from -any- aspect of the past process with replacment and create usually 3-9 bootstrap models randomly. The median is then taken from all model weights to wipe out outliers. This SRORP polishing -if done carefully- is usually good for a 3-4% gain in all benchmarks

link