Hacker News new | ask | show | jobs
by brucethemoose2 1089 days ago
There is some overhead from the quantization, and right now the operations themself are sometimes done at higher precision than the weights in RAM.

And widespread hardware 4 bit will take some time. If the HW makers started designing 4 bit silicon in 2022, then we are still years away.