Hacker News new | ask | show | jobs
by rtkwe 1175 days ago
I still think we're a long ways off. LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions. So far it's shown it's really really good at continuing a conversation with more text but as far as I understand them there's not a usable comprehension of what's actually being asked and answered.

The point that would say to me the LLM actually has any "understanding" of what it's saying would be when it's able to reliably say "I don't know the answer to that" instead of making up things from scratch. You see that a lot if you ask Bing/Bard "Who is _____?" Most of them are kind of right but a lot of large details are just completely fabricated. A lot of the facts it gets wrong are things Google is already able to produce when queried like where was Person X born or where did they go to school so the fact these LLMs can't slot in actual available facts says to me they're not really going to be that useful with the kind of tasks we've been working on NLP for.

2 comments

> LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions.

They can: https://arxiv.org/abs/2302.04761

> In this paper, we show that LMs can teach themselves to use external tools

“LLMs can't to my knowledge process a request into a lookup on say an actual database of facts at the moment or parse a request into API actions.”

Both Bing chat and ChatGPT plugins are examples of being able to do just this.

You’re right about how they make up answers though, but humans are often quite prone to that too…

A human, if not incentivized to lie or directly incentivized to be truthful, could at least tell you when they're making something up themselves where Bing/Bard seemingly cannot. Once it can do that I think they'll be far more useful, at least then you can have a rough idea of how much you need to check the bots work. If I have to do that for every thing it spits out the best it can do for me is give me new words to use while searching.

Granted getting the name for something to search is often half the battle in tech.

> could at least tell you when they're making something up themselves where Bing/Bard seemingly cannot.

In fact GPT-4 is quite good at catching hallucinations when the question-answer pair is fed back to itself.

This isn’t automatically applied already because the model is expensive to run, but you can just do it yourself (or automate it with a plug-in or LangChain) and pay the extra cost.

Remember that the model only performs a fixed amount of computation per generated token, so just asking it to think out loud or evaluate its own responses is basically giving it a scratchpad to think harder about your question.