| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by united893 1249 days ago
	The response takes a long time to generate. The user could just sit there and stare at a blank response, or start reading in realtime as the response is generated.

2 comments

ehnto 1249 days ago

I find it surprising that you can display any of it before the whole thing is done, since I would expect information dependencies between the start and the finish of a sentence or paragraphs. I have yet to really look into how these models work, they are black boxes to me.

link

trekkie1024 1249 days ago

From what I understand, these models generate the response one word at a time. Every time you see a new word appear at the end, the model is taking into consideration the entire chat history + its own answer so far to generate that next token.

link

ehnto 1246 days ago

Thanks for the comment, that's so fascinating since it seems to put limitations on thinking in general. A human for example can imagine future possibilities concurrently while speaking and correct themselves as they go.

It doesn't seem to map well tk how I put together a thought either, but admittedly I wouldn't really know how the mechanics of my brain do it, maybe it's not so different just with some auxiliary modules bolted on ha.

link

tux3 1249 days ago

Check out the illustrated transformer: https://jalammar.github.io/illustrated-transformer/

tl;dr: It decodes the output one word at a time, but at each step it can focus on any mix of words from the input via the attention mechanism. So the output token n can't depend on future output token n+1 in GPT, but it can attend to any of the input tokens

link

mastadoum 1249 days ago

I did not expect that, when iterating with smaller models like nanoGPT, even tough the output is one token at a time it did not felt like it would take half a second between each of them, but I guess that's what happen with billions parameters models.

link