Hacker News new | ask | show | jobs
by simonw 408 days ago
I really like Mistral Small 3.1 (I have a 64GB M2 as well). Qwen 3 is worth trying in different sizes too.

I don't know if they'll be good enough for general coding tasks though - I've been spoiled by API access to Claude 3.7 Sonnet and o4-mini and Gemini 2.5 Pro.

1 comments

How do you determine peak memory usage? Just look at activity monitor?

I've yet to find a good overview of how much memory each model needs for different context lengths (other than back of the envelope #weights * bits). LM Studio warns you if a model will likely not fit, but it's not very exact.

MLX reports peak memory usage at the end of the response. Otherwise I'll use Activity Monitor.
I'm also trusting `get_peak_memory` + some small buffer for now.

Still, it reports accurate peak memory usage for tensors living on GPU, but seems to miss some of the non-Metal overhead, however small (https://github.com/aukejw/mlx_transformers_benchmark/issues/...).