| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by l5870uoo9y 817 days ago
	If you are looking for a text to sql solution, I can in all modesty recommend my own https://www.sqlai.ai. Adding schemas can be done in any format and it automatically parsed/optimized by AI for optimal performance.

1 comments

moqca 817 days ago

How does it handle large schemas? A quick schema dump shows that we have around 100K columns between all of our tables. Say we use 10% of that.

How easy is it to select which tables should be taken into account?

Is there an intermediate context layer step to work with this?

Do you need to provide working examples for the tool to know how joins and relations are usually handled?

link

l5870uoo9y 817 days ago

> How does it handle large schemas?

For now we don't particularly handle huge schema, being over 10-20k tokens. I would suggest adding only the tables you need. But I will hopefully this week test a solution that automatically pre-determines which tables to include for huge schemas before the actual SQL query generation and thereby using relatively few tokens.

> How easy is it to select which tables should be taken into account?

You can edit the schema using code editor (it will be in CSV). That is relatively easy. I previously had a solution where you could manually tick off tables to include, but found it a bit cumbersome. Might add that back though. Ideally I want everything you run without the user having to include/exclude tables.

> Is there an intermediate context layer step to work with this?

I will test adding this. Must also be performant in terms of speed and reliability.

> Do you need to provide working examples for the tool to know how joins and relations are usually handled?

Normally AI can infer that from the schema. If not you can "teach" AI using RAG: https://www.sqlai.ai/posts/enhancing-ai-accuracy-for-sql-gen...

link

lmeyerov 817 days ago

We are working with enterprise-scale schemas, and beyond SQLs like databricks, are doing Splunk, Elastic, graph DBs, etc, where many tables/cols is common. Part of the trick includes ideas like using embedding models and continuous learning for schema subsetting & tuning. We are deploying in govs, banks, insurance companies, etc, happy to chat if sounds relevant -- louie.ai . Notebooks, dashboarding, headless APIs, pen testing, audit logs, RBAC modes, a lot needed for enterprise settings!

link