Hacker News new | ask | show | jobs
by yldedly 1543 days ago
It doesn't. It's pattern matching, and you're seeing cherry picked examples. The pattern matching is enough to give the illusion of understanding. There's plenty of articles where more thorough testing reveals the difference. Here are two: https://medium.com/@melaniemitchell.me/can-gpt-3-make-analog...

But you could also just try one of these models, and see for yourself. It's not exactly subtle.

https://www.technologyreview.com/2020/08/22/1007539/gpt3-ope...

1 comments

GPT-3 was specifically worse at jokes, which is why PaLM being good at this so impresses me. At any rate, I don't care if it only works one in ten times. To me, this is equivalent to complaining that the dog has bad marks in high school. (PaLM could probably explain that one to you: "The speaker is complaining that the dog is only getting C's. For a human a C is a quite bad mark. However getting even a C is normally impossible for a dog.")

"It's pattern matching" just sounds like an excuse for why it working "doesn't really count". At this point, you are asking me to disbelieve plain evidence. I have played with these models, people I know have played with these models, I have some impression of what they're capable of. I'm not disagreeing it's "just pattern matching", whatever that means, I am asserting that "pattern matching" is Turing-complete, or rather, cognition-complete, so this is just not a relevant argument to me.

What do you think a neuron does?

>At any rate, I don't care if it only works one in ten times

>you are asking me to disbelieve plain evidence

If you threw a thousand tries at a Markov chain, to use the classic "pure pattern matcher", it could not do any fraction of what this model does, ever, at all. You would have to throw enough tries at it that it tried every number that could possibly come next, to get a hit. So one in ten is actually really good. (If that's the rate, we have zero idea how cherrypicked their results actually are.)

And the errors that GPT does tend to be off-by-one errors, human errors, misunderstandings, confusions. It loses the plot. But a Markov chain never even has the plot for an instant.

GPT pattern-matches at an abstract, conceptual level. If you don't understand why that is a huge deal, I can't help you.

It's a pretty big deal, and there's a big difference between a Markov chain and a deep language model - the Markov chain will quickly converge, while the language model can scale with the data.

But the way these models are talked about is misleading. They don't "answer questions", "translate", "explain jokes", or anything of that sort. They predict missing words. Since the network is so large, and the dataset has so many examples, it can scale up the method of 1) Find a part of the network which encodes training data that is most similar to the prompt 2) Put the words from the prompt in place of the corresponding words in the encoding of the training data

i.e. pattern matching. So if it has seen a similar question to the one given in the prompt (and given that it's trained on most of the internet, it will find thousands of uncannily similar questions), it will produce a convincing answer.

How is that different from a human answering questions? A human uses pattern matching as part of the process, sure. But they also use, well, all the other abilities that together make up intelligence. They connect that meaningless symbols of the sentence to the mental representations that model the world - the ones pertaining to whatever the question is about.

If I ask a librarian "What is the path integral formulation of quantum mechanics?", and they come back with a textbook and proceed to read the answer from page 345, my reaction is not "Wow, you must be a genius physicist!", it's "Wow, you sure know where to find the right book for any question!". In the same way, I'm impressed with GPT for being a nifty search engine, but then again, Google search does a pretty good job of that already.

I don't know what to tell you. They specifically showed PaLM novel jokes. You're effectively saying that the paper is either mistaken or fraudulent.

In my experience with language models, what they do cannot be reduced to madlibs. But that's obviously not an argument I can prove to you.

Can we agree that if the model can explain structurally novel jokes, then it must have some measure of true understanding?

Understanding of what? What the joke is about? Then no, it has no idea what any of it means. The syntactic structure of jokes? Sure. Feed it 10 thousand jokes that are based on a word found in two otherwise disjoint clusters (pod of whales, pod of TPUs), with a subsequent explanation. It's fair to say it understands that joke format.

If you somehow manage to invent a kind of joke never before seen in the vast training corpus, that alone would be impressive. If PaLM can then explain that joke, I will change my mind about language models, and then probably join the "NNs are magic you guys" crowd, because it wouldn't make any sense.