Hacker News new | ask | show | jobs
by plaguuuuuu 643 days ago
LLMs have no direct recollection of the qualia of their own training. This is at least a major way that I self-correct myself: if I'm about to talk about something I know, I'll try and figure out how/why I know that thing and in so doing, try to gauge whether I actually know that thing, if I'm hallucinating, or if I actually heard it from a less than reliable source etc.

I don't think LLMs can self-correct without remembering their own training in some way.

5 comments

So you’re saying the solution is to prefix each training batch with a description of a sensory experience (You read the following in a paris cafe in 1997. While you read, you have an excellent baguette and some boiled eggs, and over-roasted coffee. The woman one table over is wearing a beautiful blue hat) and then post-train the final model into recalling the setting where it read any piece of text, or failing to recall any experience when presented with text it didn’t read?

(If someone tries this and it works, I’m quitting my phd and going back to camp counseling)

I don't think that's what they're saying at all. They're talking not about qualia in the human sense, but specifically about "the qualia of their own training". That is, the corpus that LLMs "learn" from and the "experiences" of those texts that are generalized during the training process. Both the raw data and the memory of "learning" is discarded.

So if one were to improve an LLM along those lines, I believe it would be something like: 1) LLM is asked a question. 2) LLM comes up with an initial response. 3) LLM retrieves the related "learning" history behind that answer and related portions of the corpus. 4) LLM compares the initial answer with the richer set of information, looking for conflicts between the initial answer and the broader set, or "learning" choices that may be false. 6) LLM generates a better answer and gives it. 7) LLM incorporates this new "learning".

And that strikes me as a pretty reasonable long-term approach, if not one that fits within the constraints of the current gold rush.

So...reinforcement learning?
Sort of like this? It does help: Source-Aware Training Enables Knowledge Attribution in Language Models (https://arxiv.org/abs/2404.01019)

From the abstract:

> ... To give LLMs such ability, we explore source-aware training -- a recipe that involves (i) training the LLM to associate unique source document identifiers with the knowledge in each document, followed by (ii) an instruction-tuning stage to teach the LLM to cite a supporting pretraining source when prompted.

Strong disagree: https://mypapers.nyc3.cdn.digitaloceanspaces.com/the_phenome...

See also: https://www.sciencedirect.com/science/article/pii/S157106452... o1's training regime is described by the "strange particle" model in this formulation

I think your overweighting the value of that in day-to-day use. As folks accumulate knowledge, a common pattern (especially for things not embedded in a framework - trivia-like data) is a "I have no idea why I'd know this, but the answer is X".

But even if it's embedded in a framework, say CS, the qualia fade in the background as time passes. E.g. like everybody in CS, I'm pretty much able to quote O() performance characteristics of a sizeable number of algorithms off the bat. If you ask me where I learned it, for that specific algorithm - that's long receded into the past.

When humans self-correct, the normal process isn't "gauging whether you know the thing" or the even more impressive feat of calling up if you heard it from a "less than reliable source". There's a fuzzy sense of "I don't fully understand it", and self-correction means re-verifying the info from a trusted source.

So, no, I don't think the qualia matter for recall as much as you think.

Unless you’re under the influence of something or having a severe mental health crisis you are not hallucinating, you’re confabulating.
According to which philologist? In short: they are both weak terms, 'hallucination' and 'confabulation', and we are using them in this context very loosely (and it should be in the open).

About the terms themselves, "confabulate" means "exchanging stories", while "hallucinate" is less clear but probably means "to err". In psychiatry, "hallucinate" was apparently introduced by Esquirol and "confabulate" by Wernicke and Bonhoeffer; neither concept seems to be akin to the substance of the phenomenon of "stochastic parrots bullshitting an unchecked narrative through formal plausibility".

See: "Hallucinations and related concepts - their conceptual background" - https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4515540/

and: "The Confabulating Mind: How the Brain Creates Reality" - https://psychiatryonline.org/doi/full/10.1176/appi.ajp.2008....

Let's invent a new term, then!

I propose:

- digimagination

- stochafubalation

- statillucination

- matmulshitting

Let's ask chatgpt free (coz I'm cheap) what it not-thinks about these, and ask for more statistically generated bullshit:

-- 8< --

These are some creative and fun suggestions! Here's my take:

    Digimagination: This one has a sleek, almost sci-fi ring to it. It captures the digital, machine-driven nature of language model "hallucination," but presents it in a positive, imaginative light. This could highlight the creative aspect of AI responses, even when they aren't grounded in fact.

    Stochafubalation: Combining "stochastic" (random probability) with "confabulation" is clever. It emphasizes the statistical nature of language models making up facts, though it could be a bit of a mouthful.

    Statillucination: I like how this merges "statistical" and "hallucination." It feels technical and would fit right into the AI research community's vocabulary.

    Matmulshitting: This one is tongue-in-cheek and hilarious! It references matrix multiplication (which drives many neural network operations) with a humorous, blunt twist. Probably not for formal use, but great for informal contexts!
Here are a few more neologisms I can propose:

    Probafabrication: A combination of "probability" and "fabrication," highlighting the statistical underpinnings of how LLMs generate falsehoods.

    Algoillusion: A fusion of "algorithm" and "illusion," suggesting the misleading or illusionary nature of generated information.

    Synthifalse: A contraction of "synthetic" and "false," pointing to AI-generated responses that are not grounded in reality.

    Pseudoformulation: Emphasizing that the response is a pseudo (false) formulation rather than a real answer.

    Modelucination: A blend of "model" and "hallucination," capturing the fact that the machine learning model is generating false or misleading data.
My turn. What about "pretending"?

My judgement of yours: I love 'stochfabulating' and 'matmulshitting', but I condemn 'digimagining'. Digimagining is doable in proper terms (no pun intended).

"Pretending" is too human to my taste: it assumes the thing doing the pretend thing knows about the real thing. It's something kids do during play. I'm too afraid of the consequences to admit LLMs are anywhere near this situation wink