| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by Der_Einzige 1108 days ago
	Any data on inference speed? I’ve found that the non quantized model was much faster on GPU than the quantized versions due to lower GPU utilization.

1 comments

It's a RAM tradeoff. If you have enough GPU RAM to load the non-quantized model it may be faster.