| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bartwr 1118 days ago
	This is kind of amazing given that obviously, GPT-4 never contained such tasks and data. I think it puts an end to the claim that "language models are only stochastic parrots and cannot do any reasoning". No, this is 100% a form of reasoning and furthermore, learning that is more similar to how humans learn (gradient-less). I still don't understand it and it blows my mind - how such properties emerge just from compressing the task of next word prediction. (Yes, I know this is oversimplification, but not a misleading one).

5 comments

godelski 1118 days ago

> GPT-4 never contained such tasks and data

No task, but we need to be clear that it did have the data. Remember that GPT4 was trained on a significant portion of the internet, which likely includes sites like Reddit and game fact websites. So there's a good chance GPT4 learned the tech tree and was trained on data about how to progress up that tree, including speed runner discussions. (also remember that as of March GPT4 is also trained on images, not just text)

What data it was trained on is very important and I'm not sure why we keep coming back to this issue. "GPT4 has no zero-shot data" should be as drilled into everyone's head as sayings like "correlation does not equate to causation" and "garbage in, garbage out". Maybe people do not know this data is on the internet? But I'm surprised if the average HN user thought that way.

This doesn't make the paper less valuable or meaningful. But it is more like watching a 10 year old who's read every chess book and played against computers beat (or do really well) against a skilled player vs a 10 year old who's never heard of chess beating a skilled player. Both are still impressive, one just seems like magic though and should raise suspicion.

link

notamy 1118 days ago

Looking at the paper, as I understand it they're using Mineflayer https://github.com/PrismarineJS/mineflayer and passing parts of the state of the game as JSON to the LLM that are used for code generation to complete tasks.

> I still don't understand it and it blows my mind - how such properties emerge just from compressing the task of next word prediction.

The Mineflayer library is very popular, so all the relevant tasks are likely already extant in the training data.

link

emptysongglass 1118 days ago

You declare:

> I think it puts an end to the claim that "language models are only stochastic parrots and cannot do any reasoning".

But then two sentences later:

> I still don't understand it and it blows my mind

I've said this before to others and it bears repeating because your line of thinking is dangerous (not sudden AI cataclysm): to feel so totally qualified to make such a statement armed with ignorance, not knowledge, is the cause of mass hysteria around LLMs.

What is happening can be understood without resorting to the sort of magical thinking that ascribes agency to these models.

link

godelski 1118 days ago

> What is happening can be understood without resorting to the sort of magical thinking that ascribes agency to these models.

This is what has (as an ML researcher) made me hate conversations around ML/AI recently. Honestly getting me burned out on an area of research I truly love and am passionate about. A lot of technical people openly and confidently are talking about magic. Talking as if the model didn't have access to relevant information (the "zero-shot myth") and other such nonesense. It is one thing for a layman to say these things, but another to see them on the top comment on a website aimed at people with high tech literacy. And even worse to see it coming from my research peers. These models are impressive, and I don't want to diminish that (I shouldn't have to say this sentence but here we are), but we have to be clear that the models aren't magic either. We know a lot about how they work too. They aren't black boxes, they are opaque, and every day we reduce the opacity.

For clarity: here's an alternative explanation to the results that's even weaker than the paper's settings (explains autogpt better). LLM has a good memory. LLM is told (or can infer through relevant information like keywords: "diamond axe") that it is in a minecraft setting. It then looks up a compressed version of a player's guide that was part of its training data. It then uses that data to execute goals. This is still an impressive feat! But it is still in line with the stochastic parrot paradigm. I'm not sure why people don't think stochastic parrots aren't impressive. They are.

But right now ML/AI culture feels like Anime or weed culture. The people it attracts makes you feel embarrassed to be associated with it.

link

two_in_one 1110 days ago

> But it is still in line with the stochastic parrot paradigm.

What makes us different from 'stochastic parrots'? Or where creativity, which machines don't have by definition, begins and ends?

There is a bunch of philosophical questions, but LLMs are more than just parrots. They develop multi-level patterns recognition. And they can solve multi-step problems which they have never seen before. May be each individual step, but not the whole combination. Selecting the right combination out of zillons is not exactly 'parroting'. Doesn't matter how we call it, it has extremely high potential in real physical world. Looks like it's a near future.

We witness the emergency of 'Verbose AI'. IMHO. Which is more then just NLP

link

famouswaffles 1118 days ago

>LLM has a good memory. LLM is told (or can infer through relevant information like keywords: "diamond axe") that it is in a minecraft setting. It then looks up a compressed version of a player's guide that was part of its training data. It then uses that data to execute goals.

What about any of what you've just said screams parrot to you ?

I mean here is how the man who coined the term describes it.

A "stochastic parrot", according to Bender, is an entity "for haphazardly stitching together sequences of linguistic forms … according to probabilistic information about how they combine, but without any reference to meaning."

So..what exactly from what you've just stated implies the above meaning ?

link

godelski 1118 days ago

> What about any of what you've just said screams parrot to you ?

>>LLM has a good memory.

Pretty much this.

> the man

The woman. Bender is a woman. In fact, 3 of 4 of the authors are woman and the 4th has unknown identity.

> according to probabilistic information about how they combine, but without any reference to meaning.

This is the part. I don't think the analogy of the parrot is particularly apt because we all know that the parrot doesn't understand calculus but is able to repeat formulas if you teach it. But we have to realize that there are real world human examples of stochastic parrots, and these are more akin to LLMs. If you don't know the phrase "Murry Gelman Amnesia" let me introduce you to it[0]. It is the concept that you can hear a speaker/writer talk about a subject matter you're familiar with, see them make many mistakes, then when they move to a subject matter you are not familiar with you trust them. We can call this writer or speaker a stochastic parrot as well since they are using words to sound convincing but they do not actually know the meaning behind the words. It is convincing because it matches the probabilistic information that a real expert may use. The difference is in understanding.

But this gets us to a topic at large that is still open: what does it mean to understand? We have no real answer to this. But a well agreed upon part of the definition is the ability to generalize: to take knowledge and apply it to new situations. This is why many ML researchers are looking at zero-shot tasks. But in the current paradigm this term has become very muddied and in many cases is being used incorrectly (you can see my rants about code generation with HumanEval or how training on LAION doesn't allow for zero shot COCO classification).

For specifically this work, we need to evaluate and think about understanding carefully. The critique I am giving is that people are acting as this is similar "understanding" to how we may drop a 10 year old into Minecraft and that 10 year old can figure out how to play the game despite never hearing about the game before (though maybe has played games before. But Minecraft is also many kids "intro game"). This is clearly not what is happening with GPT. GPT has processed a lot of information on the game before entering its environment. It has read guides of how to play, how to optimize game play, it has seen images of the environment (though this version doesn't use pixel information), and has even read code for bots that will farm items. The prompts used in this work tell GPT to use Mineflayer. They also tell it things like that mining iron ore gets you raw iron and several other strong hints of how to play the game. Chain of Thought (CoT) prompts also bring into doubt the understanding nature of a LLM, and really provide a strong case against understanding (since this is something an understanding creature considers). CoT is adding recurrent information into the bot and this causes statistical (Bayesian) updates. This is not dissimilar from allowing you to reroll a set of dice while also being able to load the dice. You can argue that CoT is part of the thought process for an entity that understands things, but need to recognize that this is not inherit to how GPT does things. You may want to draw an analogy to when teaching a child something and they confidently spit out the wrong answer and then you say "are you sure?" but we need to be careful to draw these parallels and think very nuanced and carefully. The nuance is critical here.

But I want to give you some more intuition into this understanding idea. We attribute understanding to many creatures and I'll select a subset that is more difficult to argue against: mammals and birds. While they don't understand everything at the level of humans, it is clear that there are certain tasks they understand, being able to use tools, quickly adapt to new novel environments, and much more. But there's a key clue here about something, we know that they can all simulate their environments. How? Because they dream. I can't help but think this is part of the inspiration for Philip K Dick naming his book that way, since this is question we're getting at is part of its central theme. But as for GPT, it isn't embodied. It does not seem to be able to answer questions about itself and it has show clear difficulties in simulating any environment. While it can make some hits, it makes more misses.

TLDR: see this prompt and ChatGPT's response: https://i.imgur.com/sK4pLw0.png

[0] https://www.epsilontheory.com/gell-mann-amnesia/

Fwiw: Bard answers similarly to ChatGPT: https://i.imgur.com/CmWsf9X.png https://i.imgur.com/QJXIBDl.png https://i.imgur.com/zSGjYss.png

Side Note: I'm even often critical of Bender myself. I think she is far too harsh on LLMs and is promoting dommerism that isn't helpful. But this has nothing to do with the meaning of Stochastic Parrot. We should also recognize that the term has changed as it has entered the lexicon and adapted. Just like every other word/phrase in human language.

link

usaar333 1118 days ago

> TLDR: see this prompt and ChatGPT's response

And wow, that's GPT4.

I've had similar thoughts as you. It feels like amazing intelligence one day, but the next seems like a extremely good, but naive pattern matcher.

I've experienced similar GPT-4 disappoinments trying to teach it concepts not well in training data (it does badly) or making modifications to programs that go outside training data (e.g. make a tax calculator calculate long term capital gain tax correctly).. ends up doing much worse than a human.

link

godelski 1118 days ago

To be clear, that's ChatGPT, not GPT4. GPT4 should be better, but it is still limited beta and I haven't bothered joining. Note that 3.5-turbo (the API) is worse

> They both weigh the same amount, which is 1 pound.

It is clearly a strong example of Murry Gelman Amnesia when we can't trust it to tell us the difference between two simple things but we trust it to tell us complicated things.

It is also a clear example of how it is a stochastic parrot -- doesn't understand what it is saying -- as it even explains the reasoning and is not self consistent. We wouldn't expect an entity that can understand something to be wildly non-consistent in this short of a period of time. Clearly the model is relying more on the statistics of the question (the pattern and frequency that most of those words are in that order) rather than the actual content and meaning of those words.

Despite this, I still frequently use LLMs. I just scrutinize them and don't trust them. Utility and trust are different things and people seem to be forgetting this.

link

asperous 1118 days ago

While I do believe LLMs can perform some reasoning, I'm not sure this is the best example as all the reasoning you would ever need for Minecraft is well contained in the data set used to train it. A lot has been written about minecraft.

To me, it would be more convincing if they developed an enterly new game with somewhat novel and arbitrary rules and saw if the embodied agent could learn this game.

link

chinchilla2020 1112 days ago

I read through the code and tried it out for 15 mins.

It's a hard-coded program that can do a text search for it's own hard-coded, human-implemented functions. Apparently it can string those functions together, but doesn't do it correctly.

https://github.com/MineDojo/Voyager/tree/main/voyager/contro...

20 minutes of light reading through the repository pretty much dispels any notions that this is a self-learning system that can reason and think. It's the same minecraft automation we have been seeing for a decade now, with a chatbot text search builtin.

link