Hacker News new | ask | show | jobs
by areddyyt 643 days ago
We don't achieve peak compression efficiency because more complex weight unpacking mechanisms kill throughput.

To be more explicit, the weight matrix's values belong to the set of -1, 0, and 1. When using two bits to encode these weights, we are not effectively utilizing one possible state:

10 => 1, 01 => 0, 00 =>-1, 11 => ?

I think selecting the optimal radix economy will have more of a play on custom silicon, where we can implement silicon and instructions to rapidly decompress weights or work with the compressed weights directly.