Hacker News new | ask | show | jobs
by woodson 1269 days ago
I think quantization (e.g. 4-bit, https://arxiv.org/abs/2212.09720) and sparsity (e.g. SparseGPT, https://arxiv.org/abs/2301.00774) will bring down inference cost.

Edit: This isn’t handwaving btw, this is to say some fairly decent solutions are available now.