Hacker News new | ask | show | jobs
by dedicate 383 days ago
Interesting points! I'm always curious, though – beyond the theoretical benefits, has anyone here actually found a super specific, almost niche use case where fine-tuning blew a general model out of the water in a way that wasn't just about slight accuracy bumps?
3 comments

Yup! I'll have to write some of these up. I can probably do open datasets and evals too. If you have use cases you'd like to see let me know! Some quick examples (task specific performance):

- fine-tuning improved performance of Llama 70B from 3.62/5 to (worse than Gemma 2B) to 4.27/5 (better than GPT 4.1), as measured by evals

- Generating valid JSON improved from <1% success rate to >95% after tuning

You can also optimize for cost/speed. I often see a 4x speedup and reducing costs by 90%+, while matching task-specific quality.

Don't you get valid JSON success rate of 100% with constrained decoding with any model?
Fine tuning is also about reducing costs. If you can bake half the prompt in the model through fine tuning, this can halve the running costs.
As an example Genatron is made possible by fine-tuning in order to create entire applications that are valid. It's similar to the valid json example, where you want to teach specific concepts through examples to ensure the correct syntactic and semantic outputs.