Hacker News new | ask | show | jobs
by int_19h 395 days ago
Have you tried quantizing them down to 4 bits to save on RAM?
1 comments

I have found that even 2 bit quantization works, but you have to make sure you only discard the LABs (that’s what we are calling the Left Aligned Bits internally). I have no idea why it works so well but it has cut our costs significantly.