|
|
|
|
|
by ggerganov
1054 days ago
|
|
> I don't recall the details exactly, but I don't think it ever did very much. How would you have known if the trick actually reduces the outliers in the weights? Even if the transformer quality does not improve overall, having less outliers as a result is very beneficial for more accurate quantization of the data |
|
The "how" is pretty straightforward.