Hacker News new | ask | show | jobs
by MyFirstSass 929 days ago
Thanks for the tip. I'm on the M2 Air with 16 GB's of ram.

If anyone has faster than 12tkps on Air's let me know.

I'm using the LM Studio GUI over llama.cpp with the "Apple Metal GPU" option. Increasing CPU threads seemingly does nothing either without metal.

Ram usage hovers at 5.5GB with a q5_k_m of Mistral.

2 comments

Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.
LlamaFile typically outperforms LM Studio and even Ollama.