What is the effect of this less bits? Is it like truncating hashes where you start going off into the wrong thing entirely, or more like less accuracy so that if you are talking about soft penguins it will start thinking you mean wet penguins?
And is there a domain specific term I can look into if I wanted to read about someone trying to keep all the bits, but the runtime (trying to save ram) focusing in on parts of the data instead of this quantization?
I'm pretty far from an expert. But, at it's core ML is a bunch of matrix multiplications glued together with non-linear functions. So, quantization leads to less accuracy in the matrices of weights. Not, changes in hashes where 1 wrong bit is meaningless.
And is there a domain specific term I can look into if I wanted to read about someone trying to keep all the bits, but the runtime (trying to save ram) focusing in on parts of the data instead of this quantization?