If you are looking for a text to sql solution, I can in all modesty recommend my own https://www.sqlai.ai. Adding schemas can be done in any format and it automatically parsed/optimized by AI for optimal performance.
For now we don't particularly handle huge schema, being over 10-20k tokens. I would suggest adding only the tables you need. But I will hopefully this week test a solution that automatically pre-determines which tables to include for huge schemas before the actual SQL query generation and thereby using relatively few tokens.
> How easy is it to select which tables should be taken into account?
You can edit the schema using code editor (it will be in CSV). That is relatively easy. I previously had a solution where you could manually tick off tables to include, but found it a bit cumbersome. Might add that back though. Ideally I want everything you run without the user having to include/exclude tables.
> Is there an intermediate context layer step to work with this?
I will test adding this. Must also be performant in terms of speed and reliability.
> Do you need to provide working examples for the tool to know how joins and relations are usually handled?
We are working with enterprise-scale schemas, and beyond SQLs like databricks, are doing Splunk, Elastic, graph DBs, etc, where many tables/cols is common. Part of the trick includes ideas like using embedding models and continuous learning for schema subsetting & tuning. We are deploying in govs, banks, insurance companies, etc, happy to chat if sounds relevant -- louie.ai . Notebooks, dashboarding, headless APIs, pen testing, audit logs, RBAC modes, a lot needed for enterprise settings!
How easy is it to select which tables should be taken into account?
Is there an intermediate context layer step to work with this?
Do you need to provide working examples for the tool to know how joins and relations are usually handled?