| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by psychphysic 1214 days ago
	Its chosen by the ngram and randomly so, that does suggest it is completing the text a word at a time.

2 comments

HarHarVeryFunny 1214 days ago

Did you check the Vonnegut writing rules example I posted at top of this thread - in particular look at Bing/GPT's explanation of how its cake story matches up to Vonnegut's rules ? It's hard to imagine how it could have come up with such a coherent story, checking all the rules, if it was only conceiving of it's continuing story on a word by word basis. It's not as if sentence #1 matches rule number 1, sentence 2 matches rule number 2, etc. It seems there had to be some wholistic composition for it to do that.

Note too that despite the output being sampled from a distribution based on a "randomness" temperature, there are many case where what it is trying to say so much constrains the output that certain words/synonyms/concepts are all but forced.

link

theGnuMe 1214 days ago

Kurt Vonnegut is a conditional sub space of the embedding vectors.

link

hackinthebochs 1214 days ago

It's easy to see that its not just doing one token at a time but is anticipating future tokens. Consider the context of a Q&A. The response might start with any of a number of words, exactly which word depends on what comes after. But if it randomly chooses the wrong word, it will either be forced to complete the wrong answer, or be backed into a corner and engage in circumlocutions to course-correct. This doesn't happen in practice for recent big models.

link