|
|
|
|
|
by Wehrdo
1055 days ago
|
|
Here is an article that explains more about the outliers that emerge in large transformer models, which is what this modified softmax is being proposed to fix: https://timdettmers.com/2022/08/17/llm-int8-and-emergent-fea... The fact that these only emerge in larger models is likely one reason the author hasn't actually tried it. |
|