| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by woodson 1269 days ago
	I think quantization (e.g. 4-bit, https://arxiv.org/abs/2212.09720) and sparsity (e.g. SparseGPT, https://arxiv.org/abs/2301.00774) will bring down inference cost. Edit: This isn’t handwaving btw, this is to say some fairly decent solutions are available now.