|
|
|
|
|
by londons_explore
1087 days ago
|
|
It's a shame that large language models are mostly moving to 4 bit weights for inference, and a bunch of papers have shown promising techniques for training in 4 bit too... Remember that switching from 16 bit to 4 bit lets you have 4x as many weights, 4x as many weights loaded from RAM per second, and ~1/16 of the silicon area for the calculations (a multiplier scales with approximately the number of bits squared). That smaller silicon area will let you do more per $ too... |
|
And widespread hardware 4 bit will take some time. If the HW makers started designing 4 bit silicon in 2022, then we are still years away.