| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pama 275 days ago
	I agree. A partial counterexample is the RL training loop on verifiable tasks, which uses the model in a loop to generate training data. Another one is the cleanup/prioritization of the pretraining data using earlier models. More generally, a lot of ideas have been speculated based on very tiny models in controlled settings and they didnt pan out in real LLMs. There probably exists a minimal compute threshold for overcoming generalization traps.