|
|
|
|
|
by jaidhyani
1156 days ago
|
|
This is true in general but not in the use case they presented. If they had explained why a normalized distribution is useful it would have made sense - but they just describe this as pick-the-top-answer next-word predictor, which makes the softmax superfluous. |
|