| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by reissbaker 584 days ago

Beyond just RAG, I'm fairly bullish on finetuning. For example, Qwen2.5-Coder-32B-Instruct is much better than Qwen2.5-72B-Instruct at coding... Despite simply being a smaller version of the same model, finetuned on code. It's on par with Sonnet 3.5 and 4o on most benchmarks, whereas the simple chat-tuned 72B model is much weaker.

And while Qwen2.5-Coder-32B-Instruct is a pretty advanced finetune — it was trained on an extra 5 trillion tokens — even smaller finetunes have done really well. For example, Dracarys-72B, which was a simpler finetune of Qwen2.5-72B using a modified version of DPO on a handmade set of answers to GSM8K, ARC, and HellaSwag, significantly outperforms the base Qwen2.5-72B model on the aider coding benchmarks.

There's a lot of intelligence we're leaving on the floor, because everyone is just prompting generic chat-tuned models! If you tune it to do something else, it'll be really good at the something else.