| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by kiratp 902 days ago
	> For any model, the loss curve going down could mean it’s learning, or could mean it’s overfitting, we don’t know which without looking at validation loss, which is like a second set of test data the model hasn’t seen before. You want to look at validation accuracy.

1 comments

minimaxir 902 days ago

Accuracy is a bad metric for LLMs, especially since a LLM tokenizer can have thousands of "classes": 32,000 in the case of TinyLlama.

link

kiratp 902 days ago

I guess it comes down to whether your usecase has a single correct answer vs multiple possible ones. For example a lot of what we do has one and only one correct sequence of tokens. Need to look at both but so much of the learning material out there just focuses on loss. YMMV.

link

minimaxir 901 days ago

That is already accounted for with categorical cross-entropy loss.

link