|
|
|
|
|
by scosman
384 days ago
|
|
Yup! I'll have to write some of these up. I can probably do open datasets and evals too. If you have use cases you'd like to see let me know! Some quick examples (task specific performance): - fine-tuning improved performance of Llama 70B from 3.62/5 to (worse than Gemma 2B) to 4.27/5 (better than GPT 4.1), as measured by evals - Generating valid JSON improved from <1% success rate to >95% after tuning You can also optimize for cost/speed. I often see a 4x speedup and reducing costs by 90%+, while matching task-specific quality. |
|