| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by MyFirstSass 929 days ago

Thanks for the tip. I'm on the M2 Air with 16 GB's of ram.

If anyone has faster than 12tkps on Air's let me know.

I'm using the LM Studio GUI over llama.cpp with the "Apple Metal GPU" option. Increasing CPU threads seemingly does nothing either without metal.

Ram usage hovers at 5.5GB with a q5_k_m of Mistral.

2 comments

M4v3R 929 days ago

Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.

link

ukuina 929 days ago

LlamaFile typically outperforms LM Studio and even Ollama.

link