| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Havoc 690 days ago

> This represents an almost 8x compression ratio for every weight matrix in the transformer model

Surely you’d need more ternary weights though to achieve same performance outcome?

A bit like a Q4 quant is smaller than a Q8 but also tangibly worse so the “compression” isn’t really like for like

Either way excited about more tenary progress.

1 comments

We do quantization-aware training, so the model should minimize the loss w.r.t. the ternary weights, hence no degradation in performance.