Hacker News new | ask | show | jobs
by erichmond 1071 days ago
I do think AI is here to stay, but that initial round of AI is going to destroy the world burned fast and hot.

I can't even get ChatGPT (+ web plugin), when given a list of restaurants in NYC, to tell me which ones are still open vs have closed, and what their hours of operation / locations are.

This is a pretty low bar request IMO. Showed me we still have a ways to go before AI is where we thought it was 3-4 months ago.

7 comments

That's not an LLM problem, there's a physical divide between a physical stores operation and whatever its representation online is. Google tries to do this by tracking people and activity in locations to see if people are at a place or something but reality is there is not a clean interface to that other than contacting someone at the location.
I believe Google's data is now based largely on restaurants updating Google directly with that info. You can see in some searches where there is a prompt for the business owner to register and update/correct the hours. In my neighborhood it's to the point where many won't even post hours on their website any more, you have to check Google, which is annoying as I try to use other search engines. Sometimes on Google I'll see "updated by business owner 12 days ago" or similar. I presume Yelp has a similar (but probably smaller) database of its own.

In any case the impact for LLMs is the same, it's unavailable to them (unless they are being developed inside Google!).

This seems like a specific area where a "Semantic Web" solution could work well - some HTML tags that are specific to hours of operation, which business owners would embed in their website.

UPDATE: It looks like there is some prior art on this idea, I am not sure how widely this is supported https://schema.org/openingHours

Google even tries to bridge that gap by having automated phone calls to places to confirm their hours.

In fact this past July Fourth weekend I saw so many highly rated restaurants close without updating their hours on Google or Yelp or anywhere else online.

This is the #1 thing that makes me boycott a restaurant. If you don’t care enough about your customers to spend 2 minutes updating your hours then I will never eat there again. It’s absolutely disrespectful to your users/customers.
It can't be perfect without something like sending drones all around to addresses, but theory it's well suited to an LLM to be better and faster than a human could do it - there's various unstructured chatter and news articles and such out there online around businesses closing, opening, etc.

But it's not JUST an LLM problem: it's also a search problem and a connection-making-problem ("if a new restaurant opened at that address, the old one is probably closed").

And then even if there was a ChatGPT browsing plugin that excelled at scraping all of the relevant up-to-date info off the internet, you'd still need some layers in between that and today's context-window limits.

As more stuff changes in the real world since the training corpus for today's publicly-exposed OpenAI models, we'll probably see some further disillusionment from people who thought there was a bit more magic there than there was. But "LLMs, but with more up to date info" isn't an impossible problem with today's tech (even if you only fake it with multiple agents, multiple steps, batch jobs behind the scenes, etc), it's just not a trivial one.

>I can't even get ChatGPT (+ web plugin), when given a list of restaurants in NYC, to tell me which ones are still open vs have closed, and what their hours of operation / locations are.

This is doable using a tool I've built. The key is to have that data in a RDBMS and to use an LLM to generate the SQL query that answers your question. Companies haven't offered this yet because there's no safe way to execute these queries on your behalf. Which is where my library comes in[1].

1. https://github.com/amoffat/HeimdaLLM

Writing the SQL query is the easy part, collecting the data into the DB is the hard part. Can we get an LLM to collect the data into the DB? I was told LLMs are good at summarizing text like webpages into structured data.
The hard part is the SQL query, because you need to make sure the SQL query is safe to execute. Collecting data is far easier by comparison, but you absolutely could use an LLM for that too.
I’m not saying that the SQL query is at all easy, but since you have pretty much accomplished in on a short period of time, while Google, Yelp, etc. have still not completely solved the problem of store hours after decades of working on them, I’m going to lean towards that being the hard problem between the two.
The OP said "This is a pretty low bar request IMO" suggesting that the problem they expect an LLM to be able to do is not the hard problem you're saying Google and Yelp has not solved. It's a different problem.
Can you give an example of how you'd define safe operations?

I think a lot of use cases could just be 1) set up a database with only public data and 2) use a read-only user.

The much tricker use case is those where you want to allow inserts and updates but only on specific tables or rows.

That's mostly safe, but even then, a user could execute "SELECT SLEEP(100000000)" thousands of times and DoS your database. There are other unsafe functions that a readonly user can execute as well. I've written extensively on some of the attack surface here https://docs.heimdallm.ai/en/latest/attack_surface/sql.html

HeimdaLLM can allowlist functions and constrain queries to ensure that required conditions exist. This makes LLM + database usage have far more utility, for example, a user can be restricted to only data in their account. Support for INSERT and UPDATE is coming very soon.

Hey the initial round of AI was decades ago, FYI.
You and me reading books about fuzzy logic in the 80s didn't constitute the initial round of AI.
> You and me reading books about fuzzy logic in the 80s didn't constitute the initial round of AI.

Right, that was at least the second round, as it occurred after the first “AI Winter”.

The first round was a lot earlier than that.

Oh man… all those genetic algorithms were a waste of time then?
Lofti Zadeh ftw!
Most of the plugins are garbage, I just want chatgpt to have up to date knowledge and working natively with more media types. I don't think I've used a single plugin I thought had good results but I love gpt4 for work
The issue there is that there is a difference between what an expert who gets lucky can do with a LLM, and what the average person can do with it. People are sold by articles about the former, but long term word of mouth depends on the latter, and some portion of normal people are going to get frustrated and give up using LLMs rather than build their skill set.
Not all of us were fooled by it!
That sounds fairly easy to do. Have you tried writing a short Python yourself to get the hours of operation for your restaurant list?

Pardon my plugging my own book [1] but I have an example using LangChain and LlamaIndex to answer questions from scraped web sites. You could probably do this with a 20 line Python script.

[1] read free online: https://leanpub.com/langchain/read