|
|
|
|
|
by internet101010
871 days ago
|
|
> What kind of applications would this be useful for? What can you build with an AI data science intern that's right 75% of the time? Yeah this is the issue I have with all of the SQL generation stuff. Not only should the SQL be valid, a prompt like "generate a query that pulls sales for the last quarter" should generate the same output for everyone without fail. Vanna's business logic embedding is a good first step but even then it is only correct like 90% of the time with GPT-4. Even then, it will only work if there are strong standards and data governance structures in place that everyone within an organization is aligned on. For example, "sales" can mean different things to different people and all of that needs to be buttoned up as well. |
|
In either case, validation is the key step - you can't just trust that your SQL query is correct regardless of if you have manually written it, you still have to go through the data and check it.
That's where the SQL generation stuff can save time - if 50% of the time you can get to an answer in half the time, then it's great! Normally in my experience with current-gen LLM's when they fail they fail quickly, so the other 50% of queries don't take twice as long to write manually.
Then there is the other use case - if you aren't sure why a particular SQL query is erroring, these LLM's are great at telling you why and fixing your code.