| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by HarHarVeryFunny 1214 days ago

I wonder if you could comment on this (related to question of how far ahead these "LLM"s are planning).

This is Wharton professor Ethan Mollick playing with the new Bing chat, which seems considerably more advanced than ChatGPT (based on GPT-4 perhaps?).

Here he asks it to write something using Kurt Vonnegut's rules of writing.

https://twitter.com/emollick/status/1626084142239649792

It seems hard to explain how Bing/GPT could have generated the Vonnegut-inspired cake story, having ingested the rules, without planning the whole thing before generating the first word.

It seems there's an awful lot more going on internally in these models than a mere word by word autoregressive generation. It seems the prompt (in this case including Vonnegut's rules) is ingested and creates a complex internal state that is then responsible for the coherency and content of the output. The fact that it necessarily has to generate the output one word at a time seems to be a bit misleading in terms of understanding when the actual "output prediction" takes place.

1 comments

gbasin 1214 days ago

There is "long range" dependence, it's just only on the prompt: the conversation with the user and the hidden header (e.g. "Answer as ChatGPT, an intelligent AI, state your reasons, be succinct, etc."). That ends up being enough.

link

HarHarVeryFunny 1214 days ago

Sure, but the point being discussed is that despite the word by word output, the output does not appear to be "chosen" on a word by word basis. OP investigated the case where the word "an" anticipates the following word ("an apple" vs "a pear").

link

sharemywin 1214 days ago

I see 2 options:

1. we don't know what they(coding layer between bing and GPT) look up and store as a prompt aka working memory.

2. it can do the equivalent of receiving it's own prompt silently.

I seen with code it outputs the step for the code then writes the code.

so there's some kind of plan and execute going on. maybe it can do that in model some how

link

hackinthebochs 1214 days ago

>so there's some kind of plan and execute going on. maybe it can do that in model some how

The simple answer is that the internal state that picks the next token is stable over iterations so that the model can follow a consistent plan over multiple token outputs. Then as the plan "unfolds" in the output tokens, these tokens help stabilize the plan further, thus creating consistency over long generations.

link

psychphysic 1214 days ago

Its chosen by the ngram and randomly so, that does suggest it is completing the text a word at a time.

link

HarHarVeryFunny 1214 days ago

Did you check the Vonnegut writing rules example I posted at top of this thread - in particular look at Bing/GPT's explanation of how its cake story matches up to Vonnegut's rules ? It's hard to imagine how it could have come up with such a coherent story, checking all the rules, if it was only conceiving of it's continuing story on a word by word basis. It's not as if sentence #1 matches rule number 1, sentence 2 matches rule number 2, etc. It seems there had to be some wholistic composition for it to do that.

Note too that despite the output being sampled from a distribution based on a "randomness" temperature, there are many case where what it is trying to say so much constrains the output that certain words/synonyms/concepts are all but forced.

link

theGnuMe 1214 days ago

Kurt Vonnegut is a conditional sub space of the embedding vectors.

link

hackinthebochs 1214 days ago

It's easy to see that its not just doing one token at a time but is anticipating future tokens. Consider the context of a Q&A. The response might start with any of a number of words, exactly which word depends on what comes after. But if it randomly chooses the wrong word, it will either be forced to complete the wrong answer, or be backed into a corner and engage in circumlocutions to course-correct. This doesn't happen in practice for recent big models.

link