Hacker News new | ask | show | jobs
by lolpanda 870 days ago
I don't think any of those text-to-sql models are solving the right problems. The hard part is not syntax or I don't know how to write a group by query. Most data scientists and engineers spend more time on understanding the meaning of the data. One cannot simply look at a 50 columns table in Snowflake and guess what columns are by their names. For example, we have 10 columns in one tables, all named ...price. We have to go to wiki to find the actual meaning or read the DBT definitions. I cannot trust any queries that models produce because they don't understand the data; they only understand the query syntax.
5 comments

At Databricks we have an LLM that is fine-tuned to do the problem you raise -

https://www.databricks.com/blog/announcing-public-preview-ai...

Many customers like it a lot. Although perhaps in your case if there are many pricing details it may not be quite accurate.

Can I ask how you fine-tune or if you can be a bit more specific?
You are right but you can use RAG to “teach” AI about your schema. I did a write up on my implementation [1].

[1]: https://www.sqlai.ai/posts/enhancing-ai-accuracy-for-sql-gen...

Yeah, we've been working on this problem a good bit and I think text-to-sql is a dead-end for analytical questions.

We (https://www.definite.app/) ended up abandoning text-to-sql in favor of answering questions with a semantic layer (which LLM's are far more effective against).

https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40

So you don’t use AI to generate SQL to retrieve data? As you say on the web site?
We ultimately generate SQL, but the LLM doesn't write SQL, the semantic layer does.
I totally agree that the value is in understanding the data. However, as a tool, do you see value in being able to quickly skeleton out the query? Autocomplete in code is a reasonable analogy to me.
It seems like fine tuning on queries using the underlying schema would work, people are doing this.