|
|
|
|
|
by oh_teh_meows
3275 days ago
|
|
"A technique called weight quantization, for example, represents each neural network parameter with only a few bits, sometimes a single bit, instead of the standard 32...The models are equally accurate, but the compressed version runs about 20 times faster." This is pretty great. How can we tell if a particular ML problem is amenable to weight quantization without sacrificing accuracy? |
|
if you have a 24bit value.. say, a 24bit color, that means you have ~16million.. 2^24==16777216.. possible colors
but if you only want to use 200 colors you can, instead of representing them as the full 24bit value, use an 8bit value.. 2^8==256>200.. and have those 8bits represent a value in an index that points to the desired full 24bit value
so you have to ask yourself.. what parameters of my neural net can be represented as an index? or, what parameters are of a quantity less than the parameter values' size?
wiki defines ann parameters as:
An ANN is typically defined by three types of parameters:
here is a great paper that tries to answer this question for you in a way that highlights error resulting from quantization decisions(i)(o) https://en.wikipedia.org/wiki/Artificial_neural_network#Netw...
(i) https://www.cmpe.boun.edu.tr/~ethem/files/papers/fatih_icann...