| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by sidr 593 days ago
	Cross-entropy is not the KL divergence. There is an additional term in cross-entropy which is the entropy of the data distribution (i.e., independent of the model). So, you're right in that minimizing one is equivalent to minimizing the other. https://stats.stackexchange.com/questions/357963/what-is-the...

1 comments

skzv 592 days ago

Yes, you are totally correct, but I believe this term is omitted from the cross-entropy loss function that is used in machine learning? Because it is a constant which does not contribute to the optimization.

Please correct me if I'm wrong.

link