|
|
|
|
|
by isoprophlex
542 days ago
|
|
A pre-processing phase does a lot of heavy lifting, where we stuff the table and column comments, additional metadata, and some hand-tuned heuristics into a graph-like structure. Basically using LLMs itself to preprocess the schema metadata. Everything is very boring tech-wise, using vanilla postgres/pgvector and a few hundred lines of python. Every RAG-searchable text field (mostly column descriptions and a list of LLM-generated example queries) is linked to nodes holding metadata, at most 2 hops out. The tool is available to 10.000 users, but load is only a few queries per minute at peak... so performance wise it's fine. |
|
[1] (https://github.com/eloquentanalytics/pyeloquent/blob/main/RE...)