| So it looks like it scores 76.5% on SQL-Eval [0], a bit behind GPT-4 at 83% and sqlcoder-15b at 78%. What kind of applications would this be useful for? What can you build with an AI data science intern that's right 75% of the time? As a programmer who always has to look stuff up when I SQL, I could definitely see asking something like this for a first draft of a query but it seems like I'm slightly better off asking the bigger models in these one-off cases (and I can run a 15b easily on my 64GB m1). If I'm in a corporate setting I'm not going to leak my schema into OpenAI's training data and there are definitely times when I'd want to run queries offline. Small/local models are great when you want to do a ton of queries (save $$). A mini data scientist that could be queried by non-technical folks would be awesome but I wonder if there's a way to determine whether the query is falling in the 25% "incorrect" case... maybe there's a RAID-like consensus algorithm where you have multiple interrogate each other's answers to get a higher overall success rate. Mostly thinking out loud :) but maybe ya'll have more ideas. Congrats on the release, OP! [0]: https://github.com/defog-ai/sql-eval |
2. Combining and slicing data is a craft, and doing it subtly wrong in one step can lead to fatal errors in the outcome.
And most importantly, it can be very difficult to notice. Numbers don't smell.
That is why I would be very hesitant to give a slightly more than trivial task to an engine that fails 25% of the time.
But I guess that is the same as any other programming task. Just that other programming tasks require a lot of boilerplate where an AI can help. SQL is much more straight to it.
Maybe it could be useful to ask questions that are similar to writing testcases "how can I verify that my query is doing the right thing"?