| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by osanseviero 938 days ago
	Yes, you're correct. I tried to connect a common training problem (gradient explosion and vanishing gradient) with the issue of softmax being sensitive to large values. I agree it's misleading/inaccurate, so will rewrite that part. That said, the whole neural network will be sensible to large values, so it won't be fixed by a numerically stable softmax. The normalization is a key aspect for the network to work.