| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by joshstrange 264 days ago

> Rather, people have to ask questions of it, and interact with the data. Increasingly, that is via AI tooling.

That is, given all my own experiences on that front, terrifying if "increasingly" people are interacting with their data via AI tooling. In all the testing I've done, it can seem like magic "Look, it just told us XXX piece of data and we just asked a simple question!" but LLMs, even with copious amounts of context, are not good at understanding your business rules for understanding your data. And that goes for just about any company with more than "Pet Store"-level complexity (especially after years or decades of the data growing/changing).

Perhaps this has improved/changed but I used LLMs daily and nothing indicates to me that it's improved enough to make this worthwhile. Any AI-only interface to data I would assume is either dealing with a laughably simple dataset/schema (or super new) or lying to you constantly.

1 comments

ryadh 264 days ago

(Ryadh from ClickHouse here) Your comment is spot-on. This the main challenge with Agentic Analytics and there are known limitations. It is also where we are orienting our investments atm.

Our own experience running internal agents taught us that the best remediation comes from providing the LLMs with the maximum and most accurate context possible. Robust evaluations are also critical to measure accuracy, detect regressions, and improve. But there is no silver bullet.

SOTA LLMs are increasingly better at generating SQL and notoriously bad with math and numbers in general. Combining them with powerful querying capabilities bridges that gap and makes the overall experience an useful one.

IMO, we'll always have to deal with the stochastic nature of these models and hallucinations, which calls for caution and requires raising awareness within the user base. What I found watching our users internally is that, while it's not magical, it allows users to request data more often, and compounds in data-driven decision-making, assuming the users are trained to interpret the interactions

link

joshstrange 264 days ago

I'll freely admit you have more data (experience) to work with on this than I did in the tests I ran almost a year ago. I spent a lot of time documenting my schemas, feeding the LLM sample rows, etc and the final results were not useful enough even as a starting point for a static query that a developer would improve on and "hard code" into a UI. I approached it as both:

- Wouldn't it be cool to let my users chat with their data? ("How many new users signed up today/this event/this month/etc?" or "How much did we make yesterday?")

- An internal tool to use as a starting point for analytics dashboards

I still use LLMs to help write queries if it's something I know can be done but can't remember the syntax but I scrapped the project to try and accomplish both the above goals due to too many mistakes. Maybe my data is just too "dirty" (but honestly, I've never _not_ seen dirty data) and/or I should have cleaned up deprecated columns in my tables that confused the models (even with strict instructions to ignore them, I should have filtered them completely) but I spent way too much time repeating myself, talking in all caps, and generally fighting with the SOTA models to try to get them to understand my data so that they could generate queries that actually worked (worked as in returned valid data, not just valid SQL). I wasn't doing any training/fine-tuning (which may be the magic needed) but I felt like it was a dead end (given current models). I'll also stress that I haven't re-tested those theories on newer models and my results are at least a year out of date (a lifetime in LLM/AIs) but the fundamental issues I ran into didn't seem to be "on the cusp" of being solved or anything like that.

I wish you all the best of luck in improving on this kind of thing.

link

ryadh 264 days ago

Thanks for your detailed reply. It is great to see that you have been experimenting with this approach.

We published a public demo of the Agentic Data Stack, I'd love to hear your feedback https://clickhouse.com/blog/agenthouse-demo-clickhouse-llm-m...

Keep in mind that it's not fully "fair", since these public dataset are often documented in the internet so already present in pre-training of the models underneath (Claude Sonnet 4.5 in this case)

link