| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by stared 3 hours ago

I really recommend Qwen3.6 27B.

Make some tests, and its 8 bit version runs at 30tok/s when using llama.cpp with MTP and run on Macbook Max M5. I have 128 GB, but but 64 GB is well enough. https://github.com/stared/benching-local-llms-on-apple-silic...

When using benchmarks, it gives more-or-less the level of SotA mid-late 2025.

2 comments

iagooar 2 hours ago

I run the exact same model, on the exact same hardware - amazing results. Pair it with good search skills (Tavily, Brave, Exa) and you have a near-SOTA model on your desk.

link

wizzledonker 2 hours ago

Did you mean 2025?

link

stared 2 hours ago

Yes, fixed

link