| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by laidoffamazon 342 days ago
	Interesting. My assumption was one of the innovations of DeepSeek and the modern GPT models was performing low precision pretraining rather than just finetuning further. I didn't realize you still need accumulation at a higher precision anyway