Depends. Where is the problem? Is it the quirkiness of SQL? Perhaps something like QUEL or Datalog would yield better results? Is it declarative programming that it struggles with? GPT, for example, seems to be optimized for Python in particular. Perhaps an imperative approach is easier for the LLM to understand? It doesn't even have to be a language suitable for humans. Perhaps it would fare better with something like SQLite's byte code language?
> If the goal is to talk to a SQL database
While being able to talk to an existing SQL database would no doubt simplify the problem in a lot of cases, which is of merit, I doubt that is the actual goal. The user doesn't care about the technology, as they say. Getting the expected results out of the database is undoubtedly the actual goal.
SQL as a target is all well and good if it works reliably, but the claim was that it doesn't. If some other target performs better, there is no need to cling to SQL. It is merely an implementation detail.
> there is no need to cling to SQL. It is merely an implementation detail.
It is, in fact, also the interface. To use your example of SQLite bytecode: once your tool generates it, there is no way to feed that into SQLite. The bytecode is an implementation detail, with SQL being the public interface.
But, to stick with your example, you can then modify SQLite to accept byte code input – or straight up write your own database engine that uses said byte code. We already know how to solve that kind of problem. This is, comparatively speaking, child's play.
It is recognized that SQL as a target would theoretically provide a less labour intensive path for reasons of integrating into what already exists, but that only holds if natural language to SQL gets solved, and is not enough harder to solve than an alternative target.
A reasonable stretch goal, but if another target gets you there first, it would be foolhardy to cling to SQL. Replacing the database interface is a much simpler problem to solve.
I think the problem is the quirkiness on the English side, not the SQL side. You could translate datalog to SQL or vice versa, but understanding intention from arbitrary english is much harder. And often query results must be 100% accurate and reliable.
> I think the problem is the quirkiness on the English side
While likely, the question asked if there was any improvement shown with other targets to validate that assumption. There is no benefit in thinking.
> And often query results must be 100% accurate and reliable.
It seems that is impossible. Even the human programmers struggle to reliably convert natural language to SQL according to the aforementioned test study. They are slightly better than the known alternatives, but far from perfect. But if another target can get closer to human-level performance, that is significant.
When I find someone claiming a suspicious data analysis result I can ask them for the SQL and investigate it to see if there's a bug in it (or further investigate where the data being queried comes from). If the abstraction layer between LLM prompt and data back is removed, I'm left with (just like other LLM answers) some words but no way to know if they're correct.
1. How would the abstraction be removed? Language generation is what LLMs do; a language abstraction is what you are getting out, no matter what. There is no magic involved.
2. The language has to represent a valid computer program. That is as true of SQL as any other target. You can know that it is correct by reading it.
Once you have SQL, you have datalog. Once you have datalog, you have SQL. The problem isn't the target, it is getting sufficiently rigorous and structured output from the LLM to target anything.
So you already claimed, but, still, curiously we have no answer to the question. If you don't know, why not just say so?
That said, if you have ever used these tools to generate code, you will know that they are much better at some languages than others. In the general case, the target really is the problem sometimes. Does that carry into this particular narrow case? I don't know. What do the comparison results show?
Depends. Where is the problem? Is it the quirkiness of SQL? Perhaps something like QUEL or Datalog would yield better results? Is it declarative programming that it struggles with? GPT, for example, seems to be optimized for Python in particular. Perhaps an imperative approach is easier for the LLM to understand? It doesn't even have to be a language suitable for humans. Perhaps it would fare better with something like SQLite's byte code language?
> If the goal is to talk to a SQL database
While being able to talk to an existing SQL database would no doubt simplify the problem in a lot of cases, which is of merit, I doubt that is the actual goal. The user doesn't care about the technology, as they say. Getting the expected results out of the database is undoubtedly the actual goal.
SQL as a target is all well and good if it works reliably, but the claim was that it doesn't. If some other target performs better, there is no need to cling to SQL. It is merely an implementation detail.