| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mlpro 188 days ago
	They are not trained on the same data. Even a skim of the paper shows very disjoint data. The LLMs are finetuned on very disjoint data. I checked some are on Chinese and other are for Math. The pretrained model provides a good initialization. I'm convinced.