I have a MBP M1 Max 64GB and I get 40t/s with llama.cpp and unsloth q4_k_m on the 30B A3B model. I always use /nothink and Temperature=0.7, TopP=0.8, TopK=20, and MinP=0 - these are the settings recommended for Qwen3 and they make a big difference. With the default settings from llama-server it will always run into an endless loop.
The quality of the output is decent, just keep in mind it is only a 30B model. It also translates really well from french to german and vice versa, much better than Google translate.
Edit: for comparision, Qwen2.5-coder 32B q4 is around 12-14t/s on this M1 which is too slow for me. I usually used the Qwen2.5-coder 17B at around 30t/s for simple tasks. Qwen3 30B is imho better and faster.
I am not a mac person, but I am debating buying one for the unified ram now that the prices seem to be inching down. Is it painful to set up? The general responses I seem to get range from "It is takes zero effort" to "It was a major hassle to set everything up."
> I am not a mac person, but I am debating buying one for the unified ram
Soon some AMD Ryzen AI Max PCs will be available, with unified memory as well. For example the Framework Desktop with up to 128 GB, shared with the iGPU:
It's relatively easy. macOS is easier to set up than Linux, but it will always depend on your specific needs and environment.
E.g.: I go a little bit overboard for the average macOS user:
- custom system- and app-specific keyboard mappings (ultra-modifier on caps-lock; custom tabbing-key-modifier) via Karabiner Elements
- custom trackpad mappings via BetterTouchTool
- custom Time Machine schedule and backup logic; you can vibe-code your install script once and re-use it in the future; just make it idempotent
- custom quake-like Terminal via iTerm
- shell customizations
- custom Alfred workflows
- etc.
If all you need is just a sensible package manager and the terminal to get started, just set up Time Machine with default settings, Homebrew, your shell, and optionally iTerm2, and you're good to go. Other noteworthy power-user tools:
- Hammerspoon
- Syncthing / Resilio Sync
- Arq. Naturally, the usual backup tools also run on macOS: Borg, Kopia, etc.
I mean that's more like old habits no? MacOS is pretty easy to setup. if all your target apps are available on it, you shouldn't have much of a problem.
As a Windows/MacOS/Linux dweller kinto is a godsend so I can have macos keyboard (but you could have linux or windows by default) on all OSes https://kinto.sh/
Honestly, it is quite a hastle, took me 2 hours BUT. if you just take the whole article text and paste that to gemini-2.5-pro and give your circumstance, i think it will give you specific steps for your case and it should be trivial from that moment on
The quality of the output is decent, just keep in mind it is only a 30B model. It also translates really well from french to german and vice versa, much better than Google translate.
Edit: for comparision, Qwen2.5-coder 32B q4 is around 12-14t/s on this M1 which is too slow for me. I usually used the Qwen2.5-coder 17B at around 30t/s for simple tasks. Qwen3 30B is imho better and faster.
[1] parameters for Qwen3: https://huggingface.co/Qwen/Qwen3-30B-A3B
[2] unsloth quant: https://huggingface.co/unsloth/Qwen3-30B-A3B-GGUF
[3] llama.cpp: https://github.com/ggml-org/llama.cpp