Hacker News new | ask | show | jobs
by simondw 1101 days ago
I'm struggling to imagine a mental model of the LLM for which that would make sense. A human who's willing to a lie a little, but comes clean when called out? A robot that mostly doesn't make mistakes, and is more likely to catch its own than make more?
6 comments

In my experimentation this ... works sometimes?

For ex. I tried asking ChatGPT about cartoons from childhood. I wrote "What was that cartoon in the 1980s that was based on some kind of gummy candy?" and it correctly identified "The Adventures of the Gummi Bears". I wrote "Sing the theme song for me" and it produced the song missing the first verse. I wrote "That is missing the first verse!" and it produced the whole correct song.

On the other hand, when I asked it to describe the instrumental 90s X-Men theme song, it tells me:

'...the lyrics are epic and uplifting, with lines like "We're the X-Men, we're the best there is at what we do." The song also has a sense of urgency and danger, with lines like "We're fighting for our lives" and "The mutant race will survive"...'

When I put "The X-Men theme song doesn't have lyrics" it readily accepted the correction but unlike getting the missing first verse I wasn't really getting any verifiable information by making the correction.

And of course it was happy to tell me about a nonexistent Gummi Bears / Rescue Rangers crossover episode.

Honestly, at least when it comes to code in widely used languages, GPT-4 is very good at catching mistakes in generated output after one makes a simple request for a second check. In most cases, there isn't even a need to explain what the specific issue, concern or error in the provided code is.

This does make sense as, beyond what has been typed, there is no memory implemented in most of these models, so revisions are currently the only game in town to get more accurate results.

What surprised me most concerning this entire situation is that the model did insist on being correct. Normally ChatGPT has been set up to be a bit cautious and more likely to admit to having made a mistake, to the point where if you ask in a direct manner like this lawyer has done, the model may claim to have been incorrect, even when the output was actually correct, in my experience. Bings implementation, of the same underlying model, meanwhile can be so forceful in trying to convince users that the output is correct, even when provided with online resources that show the oposite, that it would not be unreasonable to feel gaslit by that LLM.

The rest of this situation was not very surprising and I have do admit, I am happy that this was caught right away. Lawyers actions have a massive impact on countless people every day, if this had not become such a public scandal right away, perhaps a lot of defendants would have suffered under improper representation due to reliance on imperfect models.

My layman understanding is it’s not grounded. GPTs are schizophrenic, but unlike schizophrenic man who is delusional and failing to be bound onto the reality he is in, GPT is actually up in the air.

That and that it’s just a language model, an approximation of neither the world, nor a body of knowledge, but of English, and not an answering machine at all.

His mental model is "magic".

This reminds me of a story, I believe referring to Pascal's demonstration of his (newly invented, entirely mechanical) calculator to the Royal Society. He showed that pressing certain levers in the correct order means you want to operate on certain numbers, and certain other levers mean choosing the operation you want, and then by cranking his calculator you could read off the answer of doing your operation on your numbers. World's first calculator! Someone asked: if you press the levers wrong, do you still get the right answer?

Their mental model is "magic" and they don't understand the details of how magic works, because it's magic.

This is not an unreasonable mental model. You can ask ChatGPT to give you a program, ask it "is this correct?" and it'll find and fix bugs. To a layperson it looks like it is capable of double checking its work and finding an error. Why would it be any different here?

(the answer of course that the LLM doesn't actually search the internet and/or doesn't have access to a law database it can query)

It's not a mental model and the bot isn't thinking. This is a limitation of the tech.
GP means the mental model that the lawyer had of ChatGPT: why did he think that he could check ChatGPT's work by just asking it "hey are you sure about that"?