| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tarruda 930 days ago
	> I can run it on my Macbook Air at 12tkps, can't wait to try this on my desktop. That seems kinda low, are you using Metal GPU acceleration with llama.cpp? I don't have a macbook, but saw some of the llama.cpp benchmarks that suggest it can reach close to 30tk/s with GPU acceleration.

1 comments

MyFirstSass 930 days ago

Thanks for the tip. I'm on the M2 Air with 16 GB's of ram.

If anyone has faster than 12tkps on Air's let me know.

I'm using the LM Studio GUI over llama.cpp with the "Apple Metal GPU" option. Increasing CPU threads seemingly does nothing either without metal.

Ram usage hovers at 5.5GB with a q5_k_m of Mistral.

link

M4v3R 930 days ago

Try different quantization variations. I got vastly different speeds depending on which quantization I chose. I believe q4_0 worked very well for me. Although for a 7B model q8_0 runs just fine too with better quality.

link

ukuina 929 days ago

LlamaFile typically outperforms LM Studio and even Ollama.

link