|
|
|
Show HN: Distilled 0.6B text-to-SQL model
(github.com)
|
|
5 points
by maciejgryka
148 days ago
|
|
We used our platform to fine-tune a tiny text-to-SQL model using distillation from DeepSeek V3. Repo has instructions for how to replicate this. This is definitely not the best-performing model like this out there! But I found it surprising we were able to get to this much out of it: stone's throw away from a teacher 1000x the size! We also ran the same thing using the 4B Qwen and matched the teacher accuracy, though here the difference is merely 100x :) I find this pretty cool - obviously our distilled models can only do this one task and don't generalize, but that's often exactly what you want when you're building agentic systems. Happy to answer any questions! |
|