| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sva_ 61 days ago
	Theoretically you can start generating away from token 0 ('unconditional generation'). But I agree, there is definitely some setup here. edit: Now that I think of it, actually you need some special token like <\|begin_of_text\|>

2 comments

computerphage 61 days ago

Do you? What's the technical detail here? Why can't you get the model's prediction, even for that first token?

link

sva_ 61 days ago

I mean mathematically you need at least one vector to propagate through the network, don't you? That would be a one hot encoding of the starting token. Actually interesting to think about what happens if you make that vector zero everywhere.

In the matmul, it'd just zero out all parameters. In older models, you'd still have bias vectors but I think recent models don't use those anymore. So the output would be zero probability for each token, if I'm not mistaken.

link

maplethorpe 61 days ago

Isn't the prompt then whatever token is token zero?

link