| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by lolpanda 870 days ago
	I don't think any of those text-to-sql models are solving the right problems. The hard part is not syntax or I don't know how to write a group by query. Most data scientists and engineers spend more time on understanding the meaning of the data. One cannot simply look at a 50 columns table in Snowflake and guess what columns are by their names. For example, we have 10 columns in one tables, all named ...price. We have to go to wiki to find the actual meaning or read the DBT definitions. I cannot trust any queries that models produce because they don't understand the data; they only understand the query syntax.

5 comments

joshhart 870 days ago

At Databricks we have an LLM that is fine-tuned to do the problem you raise -

https://www.databricks.com/blog/announcing-public-preview-ai...

Many customers like it a lot. Although perhaps in your case if there are many pricing details it may not be quite accurate.

link

l5870uoo9y 870 days ago

Can I ask how you fine-tune or if you can be a bit more specific?

link

l5870uoo9y 870 days ago

You are right but you can use RAG to “teach” AI about your schema. I did a write up on my implementation [1].

[1]: https://www.sqlai.ai/posts/enhancing-ai-accuracy-for-sql-gen...

link

mritchie712 870 days ago

Yeah, we've been working on this problem a good bit and I think text-to-sql is a dead-end for analytical questions.

We (https://www.definite.app/) ended up abandoning text-to-sql in favor of answering questions with a semantic layer (which LLM's are far more effective against).

https://www.loom.com/share/a0d3c0e273004d7982b2aed24628ef40

link

l5870uoo9y 870 days ago

So you don’t use AI to generate SQL to retrieve data? As you say on the web site?

link

mritchie712 870 days ago

We ultimately generate SQL, but the LLM doesn't write SQL, the semantic layer does.

link

edmundsauto 870 days ago

I totally agree that the value is in understanding the data. However, as a tool, do you see value in being able to quickly skeleton out the query? Autocomplete in code is a reasonable analogy to me.

link

danielmarkbruce 870 days ago

It seems like fine tuning on queries using the underlying schema would work, people are doing this.

link