|
|
|
|
|
by Havoc
644 days ago
|
|
> This represents an almost 8x compression ratio for every weight matrix in the transformer model Surely you’d need more ternary weights though to achieve same performance outcome? A bit like a Q4 quant is smaller than a Q8 but also tangibly worse so the “compression” isn’t really like for like Either way excited about more tenary progress. |
|