| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by francisduvivier 966 days ago
	I also had the same issue, in my case it was because I was trying to use a llama 2 model. When trying with codellama https://huggingface.co/TheBloke/CodeLlama-7B-GGUF/tree/main, which is based on the first llama, it works.

1 comments

Correction: looks like it has to do with the quantization rather: 8bit quantization works while less does not not seem to work. Other working model example (no conversion needed): https://huggingface.co/TheBloke//Yarn-Mistral-7B-64k-GGUF/ya...