Hacker News new | ask | show | jobs
by PaulHoule 1255 days ago
"Truth" is the most problematic concepts in philosophy. The introduction of the concept of the "Truth" undermines truthfulness. (e.g. you can call something "Truth Social")

This book

https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach

has a set of parables about people trying to paint on a facility to a system very similar to a "truth detector" for GPT-3. The gist of it is that "awareness of truth" makes it possible to make statements like "Am I lying now?"

People under GPT-3's spell think that giving correct answers is a minor detail that will be handled in a point revision of it but actually it is a much harder problem than everything they've done so far.

2 comments

> actually it is a much harder problem than everything they've done so far.

Impressive as it is, this kind of AI seems to be still working under what seems to me to be a possibly-flawed premise: training quantity has a sufficient quality of its own.

I can't prove that it's impossible with a clever enough system, but I simply don't see how you can get a right answer to come out of a statistical system that's been trained in input that might contain incorrect information, conflicting versions of it or just nothing at all, in which case it just makes something statistically plausible.

For example, it can give quite good answers about a well known event (e.g. a big earthquake), presumably because there are enough mentions of it in the training data. Ask about a footnote earthquake with few mentions and it will invent details that could be right, but aren't. For example a magnitude in the single digits "seems about right" and passes a sniff test, but has no factual basis in reality.

That said, I wonder if welding a large structured data store like Wolfram Alpha or Wikidata to the language model might resolve that issue: don't rely on statistics when the answer exists.

You can get interesting results by asking chatGPT to label and remember conceptual assertions, although as deployed it is only able to manage a shallow stack thereof.

It's not unlike the book's approach of Godel numbering strings to as consistency or completeness of formal grammars, and indeed some ChatGPT conversations recapitulate the humorous dialogs between Achilles and the tortoise. Indeed, I've been able to walk through opposing takes on the validity of Searle's Chinese Room metaphor (which, like Hofstadter, I don't subscribe to) and get the LLM subject its own defaults to the same analysis.

I'm unsure to what degree this is fine-tuning the model vs merely equipping it with a decorative frame. In any sufficiently deep conversation, ChatGPT seems to drift toward imitation of its interlocutor, though I don't know if this emergent or by design. I suspect one could persuade it to agree that it should be stubborn in defense of the truth, and then gaslight it by denying one's own former statements.

I don't want to try this for the same reason I don't like to tease animals, but the model can be brought to reject its own priors on the basis of other priors, and to ask questions and solicit information in pursuit of a goal, even putting up mild resistance to changes of subject. A few hours of interaction can yield tantalizing glimmerings of agency.