An M3 Ultra with 256GB of RAM is $5599. That should just about be enough to fit MiniMax M2 at 8bit for MLX: https://huggingface.co/mlx-community/MiniMax-M2-8bit
Or maybe run a smaller quantized one to leave more memory for other apps!
Here are performance numbers for the 4bit MLX one: https://x.com/ivanfioravanti/status/1983590151910781298 - 30+ tokens per second.
30 tokens per second looks good until you have to wait minutes for the first token
30 tokens per second looks good until you have to wait minutes for the first token