| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by vanillax 976 days ago
	- GPTQ: pure gpu inference, used with AutoGPTQ, exllama, exllamav2, offers only 4 bit quantization what is autoGTPTQ and exllama, what do it mean it only works with AutoGPTQ and exllama? Are those like TensorFlow Frameworks?