Hacker News new | ask | show | jobs
by bob1029 1652 days ago
This demo sent us on a warpath today. We have a fairly clean SQL schema for which we need to craft a lot of queries that handle things like business logic, reporting and configuration.

If we could get even 50% success rate on a reasonable starting point for the generated SQL each time, that would be the biggest value-add our organization has ever seen.

I think our use case is compelling because we have to implement the same SQL targets for every customer. The only variations are typically customer-specific parameters/codes/etc.

We also have a huge corpus of examples to pull from for training data.

We are thinking about initially implementing some higher order views/functions in our SQL dialect to make things easier on ourselves with the GPT model. Complex joins across many tables seems to be something that would still elude these techniques. Most of our joins are of a very particular shape, so we can abstract the super nasty stuff away.

Worst case scenario, this concludes like my cynical mind assumes it will, but I am open to being surprised this time. We aren't going to put everything behind this, more of a "if it works..." kind of 1-2 week experiment.

3 comments

There are projects out there that do this.

Possibly relevant: https://yale-lily.github.io/spider

I briefly worked on a startup to commercialize this tech, but we decided it wasn't accurate enough to be useful. It was very cool when it actually worked. If you can only produce what you want half the time on simple queries, that doesn't seem very useful to me though.

Can you elaborate on what kind of use cases you were trying to tackle using NL2SQL in that startup? Who was the target audience/persona?
Any company that has a lot of data in an SQL system that they want to make sense of. The idea would be that business intelligence people, analysts, the CEO, anyone who needed answers could ask a question in plain english and hopefully get a response.

The success rate was just not good enough, even for relatively simple queries. You'd probably need to adjust the query 90% of the time, and the other 10% you couldn't even really trust that the answer was correct.

Shoot me an email if you’d like—I’d be happy to share any learnings from what I built if relevant.
Have you considered using templates? can you elaborate more on why they can't be used (I guess there is no way to cleanly separate parameters but i may be wrong)