| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by brucethemoose2 1106 days ago
	exLlama supports batching, and I believe it claws back much the throughput loss from quantization (depending on the exact settings you use to quantize). And as said below, whatever throughput you lose is going to be massively offset by the ability to use smaller single GPUs.