Hacker News new | ask | show | jobs
by getmeinrn 1071 days ago
>I can't even get ChatGPT (+ web plugin), when given a list of restaurants in NYC, to tell me which ones are still open vs have closed, and what their hours of operation / locations are.

This is doable using a tool I've built. The key is to have that data in a RDBMS and to use an LLM to generate the SQL query that answers your question. Companies haven't offered this yet because there's no safe way to execute these queries on your behalf. Which is where my library comes in[1].

1. https://github.com/amoffat/HeimdaLLM

1 comments

Writing the SQL query is the easy part, collecting the data into the DB is the hard part. Can we get an LLM to collect the data into the DB? I was told LLMs are good at summarizing text like webpages into structured data.
The hard part is the SQL query, because you need to make sure the SQL query is safe to execute. Collecting data is far easier by comparison, but you absolutely could use an LLM for that too.
I’m not saying that the SQL query is at all easy, but since you have pretty much accomplished in on a short period of time, while Google, Yelp, etc. have still not completely solved the problem of store hours after decades of working on them, I’m going to lean towards that being the hard problem between the two.
The OP said "This is a pretty low bar request IMO" suggesting that the problem they expect an LLM to be able to do is not the hard problem you're saying Google and Yelp has not solved. It's a different problem.
Can you give an example of how you'd define safe operations?

I think a lot of use cases could just be 1) set up a database with only public data and 2) use a read-only user.

The much tricker use case is those where you want to allow inserts and updates but only on specific tables or rows.

That's mostly safe, but even then, a user could execute "SELECT SLEEP(100000000)" thousands of times and DoS your database. There are other unsafe functions that a readonly user can execute as well. I've written extensively on some of the attack surface here https://docs.heimdallm.ai/en/latest/attack_surface/sql.html

HeimdaLLM can allowlist functions and constrain queries to ensure that required conditions exist. This makes LLM + database usage have far more utility, for example, a user can be restricted to only data in their account. Support for INSERT and UPDATE is coming very soon.