Hacker News new | ask | show | jobs
by Der_Einzige 404 days ago
Me being old man yelling at cloud about how your chat/tool template matters more than your post-training technique.

DeepSeek-R1 is trivially converted back to a non reasoning model with just chat template modifications. I bet you can chat template your way into a good quality model from a base model, no RLHF/DPO/SFT/GRPO needed.