|
|
|
|
|
by tripplyons
266 days ago
|
|
I have can host it on my M3 laptop somewhere around 30-40 tokens per second using mlx_lm's server command: mlx_lm.server --model mlx-community/Qwen3-Next-80B-A3B-Instruct-4bit --trust-remote-code --port 4444 I'm not sure if there is support for Qwen3-Next in any releases yet, but when I set up the python environment I had to install mlx_lm from source. |
|