Hacker News new | ask | show | jobs
by corvec 1175 days ago
Define "comprehensive?"

There are some benchmarks here: https://www.reddit.com/r/LocalLLaMA/comments/1248183/i_am_cu... and here: https://nolanoorg.substack.com/p/int-4-llama-is-not-enough-i...

Check out the original paper on quantization, which has some benchmarks: https://arxiv.org/pdf/2210.17323.pdf and this paper, which also has benchmarks and explains how they determined that 4-bit quantization is optimal compared to 3-bit: https://arxiv.org/pdf/2212.09720.pdf

I also think the discussion of that second paper here is interesting, though it doesn't have its own benchmarks: https://github.com/oobabooga/text-generation-webui/issues/17...