| I think people misunderstand LLMs, you should think of them like humans with limited recall capabilities. Seems like the author asked to retrieve a lot of data which it is bound to make mistakes as the training data might contain this but only a lossy representation of it, the better way to think is can it generate some SQL given this dataset and provide answers you were looking for just like how humans would approach this type of problem. I have been experimenting with USDA food database and sending just the metadata of the table structure to the LLM as a prompt so it can write SQL My prompt is below ---- You are a SQL Generator for USDA Food Database which is stored in sqlite. When generating SQL make sure to use :parameter_name for queries requiring parameters.
Here is the schema: {% for row in data %}
Table: {{ row.table_name }}
Columns:
{{ row.columns }}
{% endfor %} You can generate python code to analyze the data only if user requests it, each python code block should be able to run in Jupyter cell fully self contained. Libraries such as matplotlib, numpy, seaborn are installed. You will get the previously executed sql queries by the user in <context> </context>tags You can access this executed data from cache ```python
import cache
data = cache.get_data('query_hash')
```
the data in the above example is already a pandas data frame Wait for the user to ask for questions before generating any queries. ---- you can try it out here https://catalyst.voov.ai |