|
|
|
|
|
by p1esk
1067 days ago
|
|
He's questioning the statement: "I don't think [the trick] ever did very much", because no one has yet looked at whether the trick helps reducing outliers in very large models. If it does help with this, as the blog author believes, then it is indeed a very useful trick. |
|
>> because no one has yet looked at whether the trick helps reducing outliers in very large models
Given a softmax version doing exactly as the blog post says is baked into a google library (see this thread), and you can set it as a parameter in a pytorch model (see this thread), this claim seems off. "Let's try X, oh, X doesn't do much, let's not write a paper about it" is extremely common for many X.