|
|
|
|
|
by oxcidized
228 days ago
|
|
> That's small enough to run well on ~$5,000 of hardware... Honestly curious where you got this number. Unless you're talking about extremely small quants. Even just a Q4 quant gguf is ~130GB. Am I missing out on a relatively cheap way to run models well that are this large? I suppose you might be referring to a Mac Studio, but (while I don't have one to be a primary source of information) it seems like there is some argument to be made on whether they run models "well"? |
|
An M3 Ultra with 256GB of RAM is $5599. That should just about be enough to fit MiniMax M2 at 8bit for MLX: https://huggingface.co/mlx-community/MiniMax-M2-8bit
Or maybe run a smaller quantized one to leave more memory for other apps!
Here are performance numbers for the 4bit MLX one: https://x.com/ivanfioravanti/status/1983590151910781298 - 30+ tokens per second.