Hacker News new | ask | show | jobs
Show HN: Distilled 0.6B text-to-SQL model (github.com)
5 points by maciejgryka 148 days ago
We used our platform to fine-tune a tiny text-to-SQL model using distillation from DeepSeek V3. Repo has instructions for how to replicate this.

This is definitely not the best-performing model like this out there! But I found it surprising we were able to get to this much out of it: stone's throw away from a teacher 1000x the size!

We also ran the same thing using the 4B Qwen and matched the teacher accuracy, though here the difference is merely 100x :)

I find this pretty cool - obviously our distilled models can only do this one task and don't generalize, but that's often exactly what you want when you're building agentic systems.

Happy to answer any questions!