Hacker News new | ask | show | jobs
by francisduvivier 965 days ago
Correction: looks like it has to do with the quantization rather: 8bit quantization works while less does not not seem to work. Other working model example (no conversion needed): https://huggingface.co/TheBloke//Yarn-Mistral-7B-64k-GGUF/ya...