Hacker News new | ask | show | jobs
by lqhl 1059 days ago
This article suggests that LLMs should use a database as a reference for factual information. Rather than asking LLMs to provide their own answers, it is recommended that they summarize based on the facts extracted from the database. This approach reduces the likelihood of hallucinations among LLMs.
4 comments

We already had databases of facts, like Wolfram Alpha, decades before LLM, and we largely ignored them. It's ironic that when trying to solve AI problems we keep reverting to these old patterns we've tried since the 80s and they kept failing. Habits die hard, I guess.

There's a categorical difference between knowing a fact, and looking up a fact. When you know a fact you can recognize it in a situation where you wouldn't know to look it up, and you'd know to utilize it in a larger solution rather than simply parrot it when specifically asked about it.

Databases of facts have and will still have their place, but that is absolutely not the solution to LLM telling apart fact from truth. They have to innately have this in their model. I don't believe the nature of LLM is to hallucinate. It's instead a side effect of how we train them. We train them to guess, to be close, but not to be correct necessarily. And why is it a surprise that's precisely what they do?

Also LLM are too small in order to be accurate. They're tiny. GPT4 is roughly 40 times smaller than a human brain. And GPT4 is very large compared to GPT-3, and GPT-3 is very large compared to LLaMA 2.

We'll need for hardware to catch up so we can scale things up pragmatically and see what happens to their ability to grasp facts. But also architectural changes, of course.

Not to mention our wetbrain software is analog, resonant with the environment, continuous and has a single uptime, in most cases. We should consider developing llms in proto human history style, random noise meets environs, evolve useful signs and symbols based on clusters of semantic embedding. Uno reverse it through a dynamic parallel narrative simulator circuit with range of values for interpretative feedback of context analysis. Assign allomorphic symbols to conceptual clusters. Refine resolution. Add modules for memory and inputs for updating knowledge, shine it up with some polish and you've got AGI
> I don't believe the nature of LLM is to hallucinate. It's instead a side effect of how we train them. We train them to guess, to be close, but not to be correct necessarily.

Thoughout this comment you speak about LLMs as-if they're animals, or real physical objects. An LLM is a formal model which is just to generate a sequence of tokens maximally probabilistically consistent with a corpus of historical text.

A digital machine running a LLM program is a physical object which necessarily generates text based on "guessing" because that's the algorithm it's running. LLMs are "guessing algorithms", all of Machine Learning is -- it is dumb brute-force analysis of conditional probability.

> GPT4 is roughly 40 times smaller than a human brain

This doesn't make any sense. GPT4 is an abstract algorithm with no "size". The brain has 10^{big number} cells, and GPT4 can be specified with a single real number. Is that the comparison to make? No, both comparisons are incoherent.

A physical device running GPT4 can be given a "size", but it would again have nothing to do with a brain.

LLMs arent living things where we can "measure their size" and "train them to know, rather than to guess". They are just the equation, `max P(answer|propmt, historical_corpus)`

A machine running GPT4 is just an electrical device generating text according to the rule given above. There is no sense of "training it to do something other than guesswork", and no sense of "size"

Larger models are not always more accurate. Overbuilding a model often leads to "overfitting" the dataset. A good example: the iphone text prediction model. It now has so much data that the suggested completed words are often useless and irrelevant in context.
This are assuming LLMs are intelligent and can think "hey I am dumb, I'll look that up".

What they are literally doing is guessing the next word, a word a time but doing it really really well and making statistically average output over a very large number of inputs.

There is no distinction between understanding "the" vs "a" and telling me 1+1=3. It is all token generation.

What they are doing depends entirely on what decoding algorithm you use. An LLM is mostly a token probability function, but it's not just that - a transformer model is capable of learning anything. Tokens are the interface, not necessarily the implementation.
A transformer can only memorize, it doesn't learn to do.

For what that concerns us here: LLMs will never learn to fact-check anything. They'll blindly regurgitate the facts they have been "taught", but never consider or evaluate "the paper cited for this fact on wikipedia is a bunch of bullshit".

Any attempt to use them to produce "facts" is ultimately just folly, in the same way Google's attempt to do so with it's search engine index is.

> [LLMs] never consider or evaluate "the paper cited for this fact on wikipedia is a bunch of bullshit".

Nor do people, though! This is setting the bar way too high.

The whole point to having edited reference sources like "encyclopedias" is that so that we can rely on the expertise of the editors in lieu of having to develop the expertise ourselves[1].

No, an LLM that simply knows a priori (via prompt hacking) which sources are trustworthy would be absolutely comparable to the way an educated-but-non-expert human approaches sources.

[1] Which is a chicken and egg problem anyway. Everyone starts with edited reference sources as tutorial material. Quite frankly everyone starts learning with wikipedia.

This is setting the bar way too high.

No. If these things are claimed to be sources of truth, then the bar needs to be that high.

It is precisely because people don't fact-check that the bar has to be so high.

> If these things are claimed to be sources of truth

That's a strawman, though. No service, nor human, "claims to be a source of truth" in the kind of profound sense you seem to be using. It stops, everywhere, at "Wikipedia (or whatever) said it and I trust it".

The only way to get access to deeper expertise is to (1) BE an expert and (2) engage in an discussion with another.

No, a transformer is a universal function approximator and is capable of learning to do anything to some degree of accuracy.

GPT doesn't do math correctly but it also doesn't just memorize it.

It seems to me that LLMs are basically an algorithmic encoding of Occam's Razor. The issue seems to be that what is most probable does not always correspond to what happens, or what makes the most sense to an embodied person.

What is most probable is not always what is most correct or most accurate.

Isn't this a serious simplification? Tokens are just the medium
This blog is only having an LLM assess what column it should run a query against.

Why is that necessary? Why have an LLM guess where the facts are?

Put all of that data in a place where it's normalized and ready to vector search.

> This approach reduces the likelihood of hallucinations among LLMs.

This has not been my experience. Did you create any benchmarks as a part of this project?

I am the author of this article. And actually what we tried to do was to replicate the simplest implementation to Retrieval Augmented Language Models by prompting the LLM. There have been many researches on this topic right now like work from Meta(https://arxiv.org/pdf/2208.03299v3.pdf). I think it can give you a picture how those RALMs boost the performance on General QA tasks.
This idea is a simplified version of Retrieval-Augmented Generation (RAG), and RAG has been studied in various research papers, such as the one available at https://arxiv.org/abs/2005.11401
My experience with RAG is that while it reduces the incidence of hallucinations* significantly (especially if you reduce the LLM temperature to zero at the same time), it doesn't eliminate them.

My startup has a product for lawyers that uses RAG to answer legal queries (https://lawlight.ai/). We have a disclaimer that "... (we) do not guarantee the accuracy of answers. You are responsible for reviewing the cited case law and drawing your own independent conclusions."

(This works within the specific context—lawyers are domain experts; and they are supposed to read through all cases they cite in court anyway.)

* I dislike the term "hallucinations." By definition LLMs hallucinate. It's just that much (or most) of the time, the hallucinations reflect reality.