|
|
|
|
|
by fshen
52 days ago
|
|
I use the same computer as you do.
m5 can run faster: pip install mlx_lm python -m mlx_vlm.convert --hf-path Qwen/Qwen3.6-27B --mlx-path ~/.mlx/models/Qwen3.6-27B-mxfp4 --quantize --q-mode mxfp4 --trust-remote-code mlx_lm.generate --model ~/.mlx/models/Qwen3.6-27B-mxfp4 -p 'how cpu works' --max-tokens 300 Prompt: 13 tokens, 51.448 tokens-per-sec
Generation: 300 tokens, 35.469 tokens-per-sec
Peak memory: 14.531 GB |
|