Hacker News new | ask | show | jobs
by yalok 576 days ago
So basically the idea is to pack 3 ternary weights (-1,0,1) into 5 bits instead of 6, but they compare the results with fp16 model which would use 48 bits for those 3 weights…

And speed up comes from the memory io, compensated a bit by the need to unpack these weights before using them…

Did I get this right?

1 comments

Yeah, that seems to be the case. Though, I suspect Microsoft is interested in implementing something like a custom RISC-V CPU that has an ALU that's tuned for doing this ternary math and added custom vector/matrix instructions. Something like that could save them a lot of power in their data centers.

If it were to catch on then perhaps we'd see Intel, AMD, ARM adding math ops optimized for doing ternary math?

my dream is to see ternary support at the HW wire level - that'd be even more power efficient, and transistor count may be less...