Hacker News new | ask | show | jobs
by oh_teh_meows 3275 days ago
"A technique called weight quantization, for example, represents each neural network parameter with only a few bits, sometimes a single bit, instead of the standard 32...The models are equally accurate, but the compressed version runs about 20 times faster."

This is pretty great. How can we tell if a particular ML problem is amenable to weight quantization without sacrificing accuracy?

2 comments

weight quantization is basically a short list of shortened values used as an index for a lookup table that represents the desired full values

if you have a 24bit value.. say, a 24bit color, that means you have ~16million.. 2^24==16777216.. possible colors

but if you only want to use 200 colors you can, instead of representing them as the full 24bit value, use an 8bit value.. 2^8==256>200.. and have those 8bits represent a value in an index that points to the desired full 24bit value

so you have to ask yourself.. what parameters of my neural net can be represented as an index? or, what parameters are of a quantity less than the parameter values' size?

wiki defines ann parameters as:

An ANN is typically defined by three types of parameters:

    The connection pattern between the different layers of neurons
    The weights of the connections, which are updated in the learning process.
    The activation function that converts a neuron's weighted input to its output activation.
here is a great paper that tries to answer this question for you in a way that highlights error resulting from quantization decisions(i)

(o) https://en.wikipedia.org/wiki/Artificial_neural_network#Netw...

(i) https://www.cmpe.boun.edu.tr/~ethem/files/papers/fatih_icann...

By applying weight quantization and measuring the resulting loss in accuracy. Predicting which techniques will work or not in deep learning is still much harder than just stumbling on one that works by trial and error.