| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by manca 843 days ago
	I've tried code-llama with Ollama, along with Continue.dev and found it to be pretty good. The only downside is that I couldn't "productively" run the 70B version, even on my MBP with M3 Max with 36GB of RAM (which interestingly should be enough to hold quantized model weights). It was simply painfully slow. 34B one works good enough for most of my use-cases, so I am happy.

1 comments

0x008 843 days ago

I tried to use codellama 34B and I think it is pretty bad. For Example I asked it to convert a comment into a docstring and it would hallucinate a whole function around it.

link

gpjt 842 days ago

What quantization were you using? I've been getting some weird results with 34b quantized to 4 bits -- glitching, dropped tokens, generating Java rather than Python as requested. But 7b, even at 4 bits, works OK. Posted about it earlier on this evening: https://www.gilesthomas.com/2024/02/llm-quantisation-weirdne...

link

3abiton 843 days ago

Same, CodeLlama 70B is known to suck. Deepseek is the best for coding so far in my experience, Mixtral 8x7B is another great contender (to be frank, for most tasks). Miqu is making a buzz, but so far I haven't tested it personally yet.

link