| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by obmelvin 512 days ago
	Agreed. They accomplished a lot with distillation and optimization - but there's little reason to believe you don't also need foundational models to keep advancing. Otherwise won't they run into issues training on more synthetic data? In a way this is something most companies have been doing with their smaller models, DeepSeek just supposedly* did it better.