| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tednoob 773 days ago
	Is this method used during training? Seems to me there could be a point to only backpropagate when the model is wrong?

2 comments

leodriesch 772 days ago

The model is always wrong, since it predicts a propability distribution over all possible tokens, but the target has 100% possibility for one token and 0 for all others.

link

zwaps 773 days ago

I mean this is implicit in back propagation, say, you need to store gradients anyway but if you get to a zero loss than you are just done.

link