|
|
|
|
|
by yalok
576 days ago
|
|
So basically the idea is to pack 3 ternary weights (-1,0,1) into 5 bits instead of 6, but they compare the results with fp16 model which would use 48 bits for those 3 weights… And speed up comes from the memory io, compensated a bit by the need to unpack these weights before using them… Did I get this right? |
|
If it were to catch on then perhaps we'd see Intel, AMD, ARM adding math ops optimized for doing ternary math?