| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joshstrange 221 days ago

I'll freely admit you have more data (experience) to work with on this than I did in the tests I ran almost a year ago. I spent a lot of time documenting my schemas, feeding the LLM sample rows, etc and the final results were not useful enough even as a starting point for a static query that a developer would improve on and "hard code" into a UI. I approached it as both:

- Wouldn't it be cool to let my users chat with their data? ("How many new users signed up today/this event/this month/etc?" or "How much did we make yesterday?")

- An internal tool to use as a starting point for analytics dashboards

I still use LLMs to help write queries if it's something I know can be done but can't remember the syntax but I scrapped the project to try and accomplish both the above goals due to too many mistakes. Maybe my data is just too "dirty" (but honestly, I've never _not_ seen dirty data) and/or I should have cleaned up deprecated columns in my tables that confused the models (even with strict instructions to ignore them, I should have filtered them completely) but I spent way too much time repeating myself, talking in all caps, and generally fighting with the SOTA models to try to get them to understand my data so that they could generate queries that actually worked (worked as in returned valid data, not just valid SQL). I wasn't doing any training/fine-tuning (which may be the magic needed) but I felt like it was a dead end (given current models). I'll also stress that I haven't re-tested those theories on newer models and my results are at least a year out of date (a lifetime in LLM/AIs) but the fundamental issues I ran into didn't seem to be "on the cusp" of being solved or anything like that.

I wish you all the best of luck in improving on this kind of thing.

1 comments

ryadh 221 days ago

Thanks for your detailed reply. It is great to see that you have been experimenting with this approach.

We published a public demo of the Agentic Data Stack, I'd love to hear your feedback https://clickhouse.com/blog/agenthouse-demo-clickhouse-llm-m...

Keep in mind that it's not fully "fair", since these public dataset are often documented in the internet so already present in pre-training of the models underneath (Claude Sonnet 4.5 in this case)

link