Hacker News new | ask | show | jobs
by francisduvivier 966 days ago
I also had the same issue, in my case it was because I was trying to use a llama 2 model. When trying with codellama https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/tree/main, which is based on the first llama, it works.
1 comments

Correction: looks like it has to do with the quantization rather: 8bit quantization works while less does not not seem to work. Other working model example (no conversion needed): https://huggingface.co/TheBloke//Yarn-Mistral-7B-64k-GGUF/ya...