|
|
|
|
|
by thomasahle
1055 days ago
|
|
> it might be that the model trains all of the inputs to become very negative It still can't do this because of L2 regularization / weight decay. If two vectors are norm 1, their inner product is at least -1, so with 2000 vectors that's still 2000 * e^(-1) =~ 735. Not saying it's theoretically impossible that it could happen. But you would have to try _really_ hard to make it happen. |
|
Of course it might have some other serious repercussions.