Hacker News new | ask | show | jobs
by FieryTransition 777 days ago
Because the way they are quantized takes time to get bug-free when new architectures are released. If a model was quantized with a known bug in the quantizer, then it effectively makes those quantized versions buggy and they need to be requantized with a new version of llamacpp which has this fixed.