|
|
|
|
|
by osanseviero
891 days ago
|
|
Yes, you're correct. I tried to connect a common training problem (gradient explosion and vanishing gradient) with the issue of softmax being sensitive to large values. I agree it's misleading/inaccurate, so will rewrite that part. That said, the whole neural network will be sensible to large values, so it won't be fixed by a numerically stable softmax. The normalization is a key aspect for the network to work. |
|