| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Der_Einzige 451 days ago
	Me being old man yelling at cloud about how your chat/tool template matters more than your post-training technique. DeepSeek-R1 is trivially converted back to a non reasoning model with just chat template modifications. I bet you can chat template your way into a good quality model from a base model, no RLHF/DPO/SFT/GRPO needed.