Y
Hacker News
new
|
ask
|
show
|
jobs
by
woodson
1269 days ago
I think quantization (e.g. 4-bit,
https://arxiv.org/abs/2212.09720
) and sparsity (e.g. SparseGPT,
https://arxiv.org/abs/2301.00774
) will bring down inference cost.
Edit: This isn’t handwaving btw, this is to say some fairly decent solutions are available now.