|
|
|
|
|
by blake929
1054 days ago
|
|
Some very interesting discussion of outlier features and quantization: https://timdettmers.com/2022/08/17/llm-int8-and-emergent-fea... * Outlier values are used to prune values.
* Transformers seem to undergo a "phase shift" in how outlier features are treated around 6.7B parameters. This could complicate research on removing them. Maybe you and Tim Dettmers would have a lot to talk about :) |
|