| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JoshTko 509 days ago
	This is exactly where Deepseeks enhancements come into play. Essentially deepseek lets the model think out loud via chain of thought (o1 and Claude also do this) but DS also does not supervise the chain of thought, and simply rewards CoT that get the answer correctly. This is just one of the half dozen training optimization that Deepseek has come up with.