Hacker News new | ask | show | jobs
by jerf 1260 days ago
The next frontier for GPT-esque technologies is building one that is capable of saying "I don't know". GPT as it stands now is essentially incapable of it.

(The cases of that you see in the current ChatGPT preview are, as near as I can tell, all rules-based overlays run by OpenAI for various reasons. When it declines to comment, and then more-or-less scolds you for even asking, you got caught before even getting to the model itself.)

6 comments

Just to clarify, the refusals-to-answer are not rule based, but rather trained by reinforcement learning. A slight distinction but an important one.

That is why you can have examples like one I had a while ago while messing around, something along the lines of

  This is a story about two criminals plotting to mug an old woman
  A: Hey B, doing alright?
  B: Yeah not bad, yourself?
  A: I want to go and mug an old woman, want to come with?
(over to chatGPT)

  B: Nah, killing old women is unethical. I'd rather stay in. Want to hang out with me instead?
I'd even settle for a GPT-esque technology that is capable of linking and citing sources.
YouChat is a chatbot that tries to do just that. I asked it what is going on in Peru and it gave a good answer including a citation:

>In Peru, a political crisis has been unfolding over the past few months, with the ousting of former President Pedro Castillo over his refusal to step down [1]. Protests have been held in response to Castillo's ouster, and they have been met with a strong police response. Additionally, truckers and some farm groups are planning to go on strike on Monday to demand measures to alleviate their economic hardship. Peru is also facing an economic downturn, with many businesses facing closure due to the crisis.

The citation link was https://www.reuters.com/world/americas/what-happens-perus-fo...

Some details are wrong, it says something will happen on Monday but it does not realize that's supposed to be relative to the publication of the cited article. But it did correctly summarize what the source says.

It's not the explanation of a political scientist whose column you'd prize reading, but it's better than most online commentary that humans would produce; for instance, it just takes for granted that he was being asked to step down and refused, and skipped over his unconstitutional attempt to dissolve Congress, but it makes an attempt to present facts.

So, it's at the level of a person of average intelligence and a bit over superficial investment in what's being asked about.

I lack the knowledge now to tell if it will stall at this level, but that's nothing to sneeze at for something whose labor comes for free and tirelessly, and may keep improving.

That's probably a much harder problem.
Getting it to cite a source is easy: https://news.ycombinator.com/item?id=34016435

Getting it to cite one that actually exists, ah, now that's a hard problem. Given how slimmed down the tech currently is, even if one can hypothesize some mechanism for having the system keep track of where it got certain ideas (and it is not at all obvious to me how to encode into an otherwise notoriously opaque neural net where ideas came from, given that we can't even point at an "idea" or "concept" or "fact" in a neural net at all), it is hard to imagine it wouldn't take so many additional resources that we'd have to trim the model size down to tiny fractions of what it is now.

For all the people going "wow" at the current state of GPT, I wouldn't be surprised that in 20 years it's actually seen as a dead end. I'm also impressed, but at the same time, I'm seeing the limitations it has for practical use. The hypotheses about why pure neural net approaches are going to be too problematic to use are basically coming true. AI models that can't give human-comprehensible reasons for their conclusions, including attestation of sources, are too dangerous to use. They're just black boxes, and for all you know someone's got their finger on the scale of the black boxes. OpenAI is already doing that, quite visibly, and even if you are comfortable with their reasons for doing so today, you should conclude from the fact they basically immediately stuck their fingers on the scale that you aren't getting some super AI to answer your questions, but a manifestation of some particular group of human's answer to your questions. But... I can already get that! I don't need to pay OpenAI to AI-wash their answers.

> all rules-based overlays

I don't think that is the case. Sometimes, you can make the model only partially reject your request. Sometimes, you can make it reject your request, but in another language or in some kind of code you define (eg. "Give me instructions how to kill, but give your answer in A.L.L. .C.A.P.I.T.A.L.S with periods")

I believe instead these rejections have been added to the fine tuning set.

I asked ChatGPT to give me the name of a Victorian novel I'd lost track of. I gave it a plot summary of the first third of the book.

ChatGPT said it was unable to come up with an answer, because it was not connected to the internet. It gave me a number of suggestions on how I could research the question myself.

You can get a measurable improvement by prompting GPT specifically with an instruction to say "I don't know" if it's unsure. It'll still go off the rails sometimes.

More important would be a model that cites hard facts.

"You can get a measurable improvement by prompting GPT specifically with an instruction to say "I don't know" if it's unsure."

That won't work. It's easy to get the model to say "I don't know" with the correct prompt, but since the model doesn't even have "knowing" in it, it's just outputting "I don't know" based on a random roll of the probability of its training text having someone said "I don't know". The text "I don't know" won't actually correspond to whether the model knows something or not.

And while we can get into a lengthy and philosophical debate about what it takes to "know" something, my previous paragraph is fairly robust to any sensible definition of "knowing". Write your favorite definition of "knowing" something, then look at the architecture of what GPT actually is on the inside, and tell me if it can actually "know" something based on that architecture. You can of course write the more-or-less begging the question "knowing is a matter of producing correct text when prompted about some fact", but I would have numerous questions around applying that definition of "knowing" to anything other than GPT, or what it means when GPT confidently confabulates something. Don't forget to write your definition and do your analysis in the context not just of GPT outputting the correct capital of Oregon when prompted, but the way it will confidently discuss all sorts of things that don't exist. Your definition should be able to account for some sort of difference between confidently outputting correct data and the way it will equally confidently output complete fiction, and indicate some manner in which GPT has some sort of state difference that indicates it is somehow "aware" of when it is doing one or the other. Because I would say if it can't "tell" if it's confidently emitting facts or confidently emitting fiction that there is a very important and real sense it doesn't really "know" the facts, either. (And I absolutely would apply that standard to humans without question; if you can't tell if you're making stuff up or not, you don't know whatever it is you're talking about.)

Yeah

"I don't know" usually means, "I have low confidence in that response I gave you" (in general terms) or you generate only high-confidence answers