| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by famouswaffles 1117 days ago

>LLM has a good memory. LLM is told (or can infer through relevant information like keywords: "diamond axe") that it is in a minecraft setting. It then looks up a compressed version of a player's guide that was part of its training data. It then uses that data to execute goals.

What about any of what you've just said screams parrot to you ?

I mean here is how the man who coined the term describes it.

A "stochastic parrot", according to Bender, is an entity "for haphazardly stitching together sequences of linguistic forms … according to probabilistic information about how they combine, but without any reference to meaning."

So..what exactly from what you've just stated implies the above meaning ?

1 comments

godelski 1117 days ago

> What about any of what you've just said screams parrot to you ?

>>LLM has a good memory.

Pretty much this.

> the man

The woman. Bender is a woman. In fact, 3 of 4 of the authors are woman and the 4th has unknown identity.

> according to probabilistic information about how they combine, but without any reference to meaning.

This is the part. I don't think the analogy of the parrot is particularly apt because we all know that the parrot doesn't understand calculus but is able to repeat formulas if you teach it. But we have to realize that there are real world human examples of stochastic parrots, and these are more akin to LLMs. If you don't know the phrase "Murry Gelman Amnesia" let me introduce you to it[0]. It is the concept that you can hear a speaker/writer talk about a subject matter you're familiar with, see them make many mistakes, then when they move to a subject matter you are not familiar with you trust them. We can call this writer or speaker a stochastic parrot as well since they are using words to sound convincing but they do not actually know the meaning behind the words. It is convincing because it matches the probabilistic information that a real expert may use. The difference is in understanding.

But this gets us to a topic at large that is still open: what does it mean to understand? We have no real answer to this. But a well agreed upon part of the definition is the ability to generalize: to take knowledge and apply it to new situations. This is why many ML researchers are looking at zero-shot tasks. But in the current paradigm this term has become very muddied and in many cases is being used incorrectly (you can see my rants about code generation with HumanEval or how training on LAION doesn't allow for zero shot COCO classification).

For specifically this work, we need to evaluate and think about understanding carefully. The critique I am giving is that people are acting as this is similar "understanding" to how we may drop a 10 year old into Minecraft and that 10 year old can figure out how to play the game despite never hearing about the game before (though maybe has played games before. But Minecraft is also many kids "intro game"). This is clearly not what is happening with GPT. GPT has processed a lot of information on the game before entering its environment. It has read guides of how to play, how to optimize game play, it has seen images of the environment (though this version doesn't use pixel information), and has even read code for bots that will farm items. The prompts used in this work tell GPT to use Mineflayer. They also tell it things like that mining iron ore gets you raw iron and several other strong hints of how to play the game. Chain of Thought (CoT) prompts also bring into doubt the understanding nature of a LLM, and really provide a strong case against understanding (since this is something an understanding creature considers). CoT is adding recurrent information into the bot and this causes statistical (Bayesian) updates. This is not dissimilar from allowing you to reroll a set of dice while also being able to load the dice. You can argue that CoT is part of the thought process for an entity that understands things, but need to recognize that this is not inherit to how GPT does things. You may want to draw an analogy to when teaching a child something and they confidently spit out the wrong answer and then you say "are you sure?" but we need to be careful to draw these parallels and think very nuanced and carefully. The nuance is critical here.

But I want to give you some more intuition into this understanding idea. We attribute understanding to many creatures and I'll select a subset that is more difficult to argue against: mammals and birds. While they don't understand everything at the level of humans, it is clear that there are certain tasks they understand, being able to use tools, quickly adapt to new novel environments, and much more. But there's a key clue here about something, we know that they can all simulate their environments. How? Because they dream. I can't help but think this is part of the inspiration for Philip K Dick naming his book that way, since this is question we're getting at is part of its central theme. But as for GPT, it isn't embodied. It does not seem to be able to answer questions about itself and it has show clear difficulties in simulating any environment. While it can make some hits, it makes more misses.

TLDR: see this prompt and ChatGPT's response: https://i.imgur.com/sK4pLw0.png

[0] https://www.epsilontheory.com/gell-mann-amnesia/

Fwiw: Bard answers similarly to ChatGPT: https://i.imgur.com/CmWsf9X.png https://i.imgur.com/QJXIBDl.png https://i.imgur.com/zSGjYss.png

Side Note: I'm even often critical of Bender myself. I think she is far too harsh on LLMs and is promoting dommerism that isn't helpful. But this has nothing to do with the meaning of Stochastic Parrot. We should also recognize that the term has changed as it has entered the lexicon and adapted. Just like every other word/phrase in human language.

link

usaar333 1117 days ago

> TLDR: see this prompt and ChatGPT's response

And wow, that's GPT4.

I've had similar thoughts as you. It feels like amazing intelligence one day, but the next seems like a extremely good, but naive pattern matcher.

I've experienced similar GPT-4 disappoinments trying to teach it concepts not well in training data (it does badly) or making modifications to programs that go outside training data (e.g. make a tax calculator calculate long term capital gain tax correctly).. ends up doing much worse than a human.

link

godelski 1117 days ago

To be clear, that's ChatGPT, not GPT4. GPT4 should be better, but it is still limited beta and I haven't bothered joining. Note that 3.5-turbo (the API) is worse

> They both weigh the same amount, which is 1 pound.

It is clearly a strong example of Murry Gelman Amnesia when we can't trust it to tell us the difference between two simple things but we trust it to tell us complicated things.

It is also a clear example of how it is a stochastic parrot -- doesn't understand what it is saying -- as it even explains the reasoning and is not self consistent. We wouldn't expect an entity that can understand something to be wildly non-consistent in this short of a period of time. Clearly the model is relying more on the statistics of the question (the pattern and frequency that most of those words are in that order) rather than the actual content and meaning of those words.

Despite this, I still frequently use LLMs. I just scrutinize them and don't trust them. Utility and trust are different things and people seem to be forgetting this.

link

YeGoblynQueenne 1117 days ago

>> To be clear, that's ChatGPT, not GPT4. GPT4 should be better, but it is still limited beta and I haven't bothered joining.

Well, I can predict the next few token sequences you're about to get in response to your comment. "That's why you got that answer GPT4 is so much better" etc.

Regarding your earlier comment about burnout, you're not alone. I stayed on HN because I could have the occasional good discussion about AI. There were always conversations that quickly got saturated with low-knowledge comments, the inevitable effect of discussions about "intelligence", "understanding" and other things everybody has some experience with but for which there is no commonly accepted formal definition that can keep the discussion focused. That kind of comment used to be more or less constant in quantity and I could usually still find the informed users' corner. After ChatGPT went viral though, those kinds of comments have really exploded and most conversations have no more space for reasoned and knowledgeable exchange.

>> LLM has a good memory.

Btw, intuitively, neural nets are memories. That's why they need so much data and still can't generalise (but, well, they need all that data because they can't generalise). There's a paper arguing so with actual maths, by Pedro Domingos but a) it's a single paper, b) I haven't read it carefully and c) it's got an "Xs are Ys" type of title so I refuse to link it. With LLMs you can sort of see them working like random access memories when you have to tweak a prompt carefully to get a specific result (or like how you only get the right data from a relational database when you make the right query). I think, if we trained an LLM to generate prompts for an LLM, we'd find that the prompts that maximise the probability of a certain answer look nothing like the chatty, human-like prompts people compose when speaking to a chatbot, they'd even look random and incomprehensible to humans.

link

godelski 1117 days ago

Well it is good to know I'm not alone. These are strange times indeed. I often think one of the great filters of civilizations is overcoming a biological mechanism that designs brains to think simple (cheap compute/complexity is often unnecessary for survival objectives) and then advancing into a level of civilization where a significant amount of the problems the civilization require beyond first and second order approximations. (it happens when most challenges are solved to first and second order approximations) Unless one is able to rewire their consciousness I don't see how this wouldn't be a issue for any species but maybe I'm thinking too narrow or from too much of a bias.

link

usaar333 1117 days ago

> GPT4 should be better, but it is still limited beta and I haven't bothered joining.

Ah my bad. Gpt4 via bing precise gets it correct:

> A kilogram of feathers weighs more than a pound of bricks. A kilogram is a metric unit of mass and is equivalent to 2.20462 pounds. So, a kilogram is heavier than a pound.

link