Hacker News new | ask | show | jobs
by dna_polymerase 2891 days ago
Int8 and Int16? I never worked with quantized models, anyone mind sharing their experience? Do such models achieve state-of-the-art performance?
2 comments

a) that's just for inference, you don't train with that.

b) a fully float-trained model "quantized" to int16 typically loses overall precision, but often works well enough. It's also usually faster (if implemented properly).

c) there's a version where you go all the way down to int1 (bits) and binary ops instead of addmuls on floats and ints. It can solve some problems. And properly compiled, it's wicked fast.

> there's a version where you go all the way down to int1 (bits)

There's also a Zen version that uses just 0.5 bits. </joke>

We lose a tiny bit of accuracy (quantizing for Android Tensorflow Lite), that's about it. I was pretty impressed.