|
|
|
|
|
by swimwiththebeat
880 days ago
|
|
I see so many business leaders touting the promise of LLMs allowing business to "talk" to their data. The promise does sound enticing, but it's actually kind of hard to get working in practice. A lot of our databases at work have columns with custom types and enums, and getting the LLM (Llama2) to write SQL queries to robustly answer natural language questions about the data is tough. It requires a lot of instruction prompting, context, and question-SQL examples (few-shot learning), and it still fails in unexpected ways. It's a tough ask for people to use a tool like this if they can't trust the results all the time. It's also a bit infeasible to scale this to tens or hundreds of tables across our data warehouse. It's great that a lot of people are trying to crack this problem, I'm curious to try this model out. I'd also love to see if other people have tried solving this problem and made any headway. |
|
In that sense I emphasized in our Blogpost that users should think of it as a documentation oracle that always gives you the exact DuckDB SQL query snippet you are looking for, which is a tremendoues time-saver if you have an abstrat idea of the query you want to write, but you're just not sure about the syntax, expecially with DuckDB having so many functions and SQL extensions.
Here are a few exammples:
- create tmp table from test.csv
- load aws credentials from 'test' profile
- get max of all columns in rideshare table
- show query plan with runtimes for 'SELECT * FROM rideshare'
- cast hvfhs_license_num column to int
- get all columns ending with _amount from taxi table
- show summary statistics of rideshare table
- get a 10% reservoir sample of rideshare table
- get length of drivers array in taxi table
- get violation_type field from other_violations json column in taxi table
- get passenger count, trip distance and fare amount from taxi table and oder by all of them
- list all tables in current database
- get all databases starting with test_
[edit: fixed list formatting]