| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by api 1065 days ago
	It's a RAM tradeoff. If you have enough GPU RAM to load the non-quantized model it may be faster.