Hacker News new | ask | show | jobs
by blake929 1054 days ago
Some very interesting discussion of outlier features and quantization: https://timdettmers.com/2022/08/17/llm-int8-and-emergent-fea...

* Outlier values are used to prune values. * Transformers seem to undergo a "phase shift" in how outlier features are treated around 6.7B parameters. This could complicate research on removing them.

Maybe you and Tim Dettmers would have a lot to talk about :)