| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by elephantum 1741 days ago
	I see, in order to benefit, model has to be quantized. It is not super clear which kinds of quantization are supported. Both Fp16 and Int8?

1 comments

Marat_Dukhan 1741 days ago

In order to benefit from optimizations in *this blog post* the model needs to be quantized to 8-bit integers. However, XNNPACK supports floating-point inference as well (including with FP16 weights), see https://blog.tensorflow.org/2020/07/accelerating-tensorflow-...

link

elephantum 1741 days ago

Thanks!

link