| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by simonw 408 days ago
	I really like Mistral Small 3.1 (I have a 64GB M2 as well). Qwen 3 is worth trying in different sizes too. I don't know if they'll be good enough for general coding tasks though - I've been spoiled by API access to Claude 3.7 Sonnet and o4-mini and Gemini 2.5 Pro.

1 comments

aukejw 407 days ago

How do you determine peak memory usage? Just look at activity monitor?

I've yet to find a good overview of how much memory each model needs for different context lengths (other than back of the envelope #weights * bits). LM Studio warns you if a model will likely not fit, but it's not very exact.

link

simonw 407 days ago

MLX reports peak memory usage at the end of the response. Otherwise I'll use Activity Monitor.

link

aukejw 407 days ago

I'm also trusting `get_peak_memory` + some small buffer for now.

Still, it reports accurate peak memory usage for tensors living on GPU, but seems to miss some of the non-Metal overhead, however small (https://github.com/aukejw/mlx_transformers_benchmark/issues/...).

link