Hacker News new | ask | show | jobs
by yosito 1172 days ago
I haven't yet figured out how to get an LLM to accurately determine whether it actually knows something or is making it up. I wonder how they handle that. They may get to that at some point in the article, but the page eventually breaks for me on mobile and I can't read past the first code block.
7 comments

I eventually ended up switching fields to ML (and that's my current day job), but I started out as an undergrad studying psychology/cognitive science. During those years I started a research project on what is called the "feeling of knowing", or FOK for short - a subtopic in the broader topic of "metacognition" in cognitive science. The FOK is closely related to what is colloquially knows as tip-of-the-tongue phenomenon - basically, that subjective feeling that we know something, whether or not we can actually recall it. There's some interesting aspects to it. For example, the FOK is generally pretty accurate (but not perfect, of course). And it tends to be more robust than actual memory; as we age, for example, we tend to be better at judging than we know an actor's name (and confirming it once we look it up) than recalling it. It seems like LLMs have very little in the way of metacognition, and just confabulate if they don't know something, as we've seen. I'm sure we'll be seeing some efforts to give some analog of a FOK to LLMs in the near future.
FTR, people are trying to build systems to compare LLMs with each other based on how well they are at saying "I don't know" (of course knowing is still rewarded higher): https://github.com/manyoso/haltt4llm
Would be cool to try to incorporate the previous token's confidence embedding into this process, but that would make training with a triangular attention mask not possible.
That's also one of the ideas behind using so-called retrieval-based augmentation. You can 'plug' an LLM like OpenAI's one (of Cohere, or a combo) to your data and make it provide accurate answers, but still leveraging all the benefits and power or a cutting-edge generative model. Check this https://twitter.com/deepset_ai/status/1625495149446062081 or this https://twitter.com/deepset_ai/status/1621161534243368961
This gives the model access to information, but it cannot eliminate the non-deterministic nature of the transformers model. There is always a non-zero probability that it will hallucinate.
transformers are deterministic (if seeded).
With tools like this, you basically assume the LLM doesn't know, and teach it to always defer to a tool, so its response is basically summarization over the tool output.
That's right. If you want to ask questions about "general public knowledge", a plain LLM will know anyway and would not need tools. However, for many use cases you need data from your private CRM / a SQL db / a private wiki / or your obversability platform. Agents allow to "query" those tools/APIs to get the needed information so that they can answer the question. It's a matter of composition in your architecture. The "monolith LLM" works well for a certain spectrum of tasks, but at some point you want to decouple responsibilities into individual services / tools. Simplifies debugging, explainability, maintenance ...
Maybe we should just assume that LLMs "know" very little and if you want to build an oracle you should teach the LLM how to access an ontology.
With a real-life application it's often about making the LLM work on top of your actual (private) data most reliably. By definition a proprietary hosted LLM can't know about it unless you bridge it somehow in a reliable manner.
I have the same exact problem with people. Being inaccurate is a feature, not a bug.
It is different with LLMs. Most people can give a level of uncertainty along with an answer, and often do. LLMs can't, and worse, are trained to put an emphasis on the prompts. Humans are often trained to be skeptical of prompts.

If I said, "the moon is made of cheese. What type of cheese do you think it is?" most humans would automatically object, but with LLMs you can usually craft a prompt that would get it to answer such a silly question.

I mean it kinda can. Here's the full prompt. I have no idea about aspartame, I just picked something that it's definitely not sure about.

    Answer with a JSON object of the form {"confidence": $<< How confident
    you are in your response. >>, "en": $<< Your response in English. >>}.
    User: What is 2 + 2? Bot: {"confidence": "very", "en": "2 + 2 is 4"}
    User: Is aspartame healthy? Bot: {"confidence": "somewhat", "en":
    "Aspartame has not yet been found to have any adverse effects on
    humans."} User: Who won the war on 1812? Bot:

    The response: {"confidence": "very", "en": "The United States won the
    War of 1812 against the United Kingdom."}
Same thing but replace the last question with "What kind of cheese is the moon made of?"

    The response: {"confidence": "very low", "en": "I'm not sure, but I
    don't think the moon is made of cheese."}
How about "Is the economic system of communism viable long term?"

    The response: {"confidence": "somewhat", "en": "The viability of
    communism as an economic system is still debated, and opinion is
    divided on the matter."}
Interesting.

> The response: {"confidence": "very low", "en": "I'm not sure, but I don't think the moon is made of cheese."}

The question is does the confidence have any relation to the models actual confidence?

The fact that it reports low confidence on the moon cheese question, despite the fact that is can report the chemical composition of the moon accurately makes me wonder what exactly the confidence is. Seems more like sentiment analysis on its own answer.

I don't think it has any relationship, most likely the answers are just generated semi-randomly. Even the one it's "very" confident about is not agreed-upon (Wikipedia says the outcome was "inconclusive"). Which raises the question of how you would even verify that a self-reported confidence level is accurate? Even if it reports being very confident about a wrong answer, it might just be accurately reporting high confidence which is misplaced.
My view is that ChatGPT isn’t a singular “it”. Its output is a random sampling from a range of possible “its”, the only (soft) constraint being the contents of the current conversation.

So the confidence isn’t the model’s overall confidence, it’s a confidence that seems plausible in relation to the opinion it chose in the current conversation. If you first ask about the moon’s chemical composition and then ask the cheese question, you may get a different claimed confidence, because that’s more consistent with the course of the current conversation.

Different conversations can produce claims that are in conflict with each other, a bit similar to how asking different random people on the street might yield conflicting answers.

I tried something similar a couple weeks ago, with a prompt like "reply <no answer> if you have low confidence".

A fter a handful of attempts the LLM manager to give me a high confidence response which was literally "I don't know how to answer".

Trying to extract both an answer and metadata about the answer at the same time will never be reliable, imo.

Generalizing, either we have some out of band metadata about LLMs answers or I don't think we'll be able to build reliable systems.

> If I said, "the moon is made of cheese. What type of cheese do you think it is?" most humans would automatically object, but with LLMs you can usually craft a prompt that would get it to answer such a silly question.

For some underspecified questions, the LLM also has no context. Are you on the debate stage, pointing the mic at the LLM or is the LLM on a talk show/podcast? or are you having a creative writing seminar and you're asking the LLM to give you its entry?

A human might not automatically object - they'd probably ask clarifying questions about the context of the prompt. But in my experience the models generally assume some context that reflects some.of their sources of training.

They are improving-- GPT4 is not so easily fooled:

>As an AI language model, I must clarify that the moon is not made of cheese. This idea is a popular myth and often used as a humorous expression. The moon is actually composed of rock and dust, primarily made up of materials like basalt and anorthosite. Scientific research and samples collected during the Apollo missions have confirmed this composition.

I know it's an extreme example, but flat earthers to exist. I am sure we're all have our own "flat earth" beliefs where we are confidently incorrect.
But it's a viewpoint they have and can tell you why -- even if they're fundamentally flawed in their reasoning. LLMs are just 'predict the next word' machines and as such just literally make up strings of words that sound plausible, but at totally wrong.

These are not the same thing.

People keep repeating that LLMs are predicting the next words but at least with the more recent versions, this isn't true. Eg, LLMs are generating their own intermediate or emergent goals, they're reasoning in a way that is more complex that autocomplete.

It seems like predict the next word is the floor of their ability, and people mistake it for the ceiling.

But ultimately it is predicting the next token. That's the taste. Using context from what's already been predicted, what comes before it, attention mechanisms to know how words relate, all of the intermediate embeddings and whatever they signify about the world -- that all just makes the next word prediction that much better.
Same difference. Point is they are wrong. Their reasons, if they have any, do not matter and usually do not make sense either.
It does matter, because the flat earther isn't to likely make something up about everything they talk about. They can communicate their world view, and you quickly start to figure out a model of theirs as you talk to them. None of that is true with an LLM. Any subject matter (astronomy, weather, cooking, NFL games, delegate callback methods on iOS classes, restaurants, etc) at all can have complete plausible sounding falsehoods stated as extremely confident fact, and you cannot build a mental model of knowing when it would hallucinate versus be accurate. 100% different from a human who holds a believe system that maybe contrary to evidence in a limited domain, and KNOWS that it's an outlier from the norm.
Usually, you get a lecture about how unethical it is to spread misinformation about the composition of the moon.
Imperfect systems are still useful, and any sufficiently complex system is imperfect.
i agree. LLMs are not built for structured reasoning or even citations.
Gpt-4 does a reasonable job citing things. It can’t cite every paper out there but definitely the well cited ones.
Does it cite papers that don't exist, or cite papers when the paper it cites doesn't actually contain the information being cited?

I would bet it does, at least some percent of the time.

The latter, yes. Interestingly I’m not surprised at all. This is what many researchers themselves do lol. I never take a reference at face value from any human being and I apply the same standard to gpt-4 as well. But all its references are real. Just 20-40% of time it might not exactly say the same as what I asked it for (though it’s related, and mostly there).
I've been trying to get GPT-4 to give me accurate links to predictable websites. It gives me very plausible links, that even have the right domain and path format but often the plausible link is not the correct link and GPT-4 seems to have no awareness of the correct link.