| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bastardoperator 475 days ago
	It's fast enough for me to cancel monthly AI services on a mac mini m4 max.

5 comments

diggan 475 days ago

Could you maybe share a lightweight benchmark where you share the exact model (+ quantization if you're using that) + runtime + used settings and how much tokens/second you're getting? Or just like a log of the entire run with the stats, if you're using something like llama.cpp, LMDesktop or ollama?

Also, would be neat if you could say what AI services you were subscribed to, there is a huge difference between paid Claude subscription and the OpenAI Pro subscription for example, both in terms of cost and the quality of responses.

link

lostmsu 475 days ago

Hm, the AI services over 5 years cost half of m4 max minimal configuration which can barely run severely lobotomized LLaMA 70B. And they provide significantly better models.

link

Matl 475 days ago

Sure, with something like Kagi you even get many models to choose from for a relatively low price, but not everybody likes to send over their codebase and documents to OpenAI.

link

nomel 475 days ago

It's probably much worse than that, with the falling prices of compute.

link

staticman2 475 days ago

Smaller, dumber models are faster than bigger, slower ones.

What model do you find fast enough and smart enough?

link

Matl 475 days ago

Not OP but I am finding the Qwen 2.5 32b distilled with DeepSeek R1 model to be a good speed/smartness ratio on the M4 Pro Mac Mini.

link

bastardoperator 474 days ago