Hacker News new | ask | show | jobs
by Havoc 644 days ago
> This represents an almost 8x compression ratio for every weight matrix in the transformer model

Surely you’d need more ternary weights though to achieve same performance outcome?

A bit like a Q4 quant is smaller than a Q8 but also tangibly worse so the “compression” isn’t really like for like

Either way excited about more tenary progress.

1 comments

We do quantization-aware training, so the model should minimize the loss w.r.t. the ternary weights, hence no degradation in performance.