| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ehnto 1251 days ago
	I find it surprising that you can display any of it before the whole thing is done, since I would expect information dependencies between the start and the finish of a sentence or paragraphs. I have yet to really look into how these models work, they are black boxes to me.

2 comments

trekkie1024 1251 days ago

From what I understand, these models generate the response one word at a time. Every time you see a new word appear at the end, the model is taking into consideration the entire chat history + its own answer so far to generate that next token.

link

ehnto 1248 days ago

Thanks for the comment, that's so fascinating since it seems to put limitations on thinking in general. A human for example can imagine future possibilities concurrently while speaking and correct themselves as they go.

It doesn't seem to map well tk how I put together a thought either, but admittedly I wouldn't really know how the mechanics of my brain do it, maybe it's not so different just with some auxiliary modules bolted on ha.

link

tux3 1251 days ago

Check out the illustrated transformer: https://jalammar.github.io/illustrated-transformer/

tl;dr: It decodes the output one word at a time, but at each step it can focus on any mix of words from the input via the attention mechanism. So the output token n can't depend on future output token n+1 in GPT, but it can attend to any of the input tokens

link