Hacker News new | ask | show | jobs
by scosman 384 days ago
Yup! I'll have to write some of these up. I can probably do open datasets and evals too. If you have use cases you'd like to see let me know! Some quick examples (task specific performance):

- fine-tuning improved performance of Llama 70B from 3.62/5 to (worse than Gemma 2B) to 4.27/5 (better than GPT 4.1), as measured by evals

- Generating valid JSON improved from <1% success rate to >95% after tuning

You can also optimize for cost/speed. I often see a 4x speedup and reducing costs by 90%+, while matching task-specific quality.

1 comments

Don't you get valid JSON success rate of 100% with constrained decoding with any model?