Hacker News new | ask | show | jobs
by drob518 100 days ago
Have you used local AI models on a 32 GB MBP? I ask because I'm looking to finally upgrade my M1 Air, which I love, but which only has 16 GB RAM. I'm trying to figure out if I just want to bump to 32 GB with the M5 MBAir or make the jump all the way to 64 GB with the low-end M5 MBP. I love my M1 Air and I don't typically tax the CPU much, but I'm starting to look at running local models and for that I'd like faster and bigger. But that said, I don't want to overpay. Memory is my main issue right now. Anyway, if you have experience, I'd love to hear it. Which MBP, stats of the system, which AI model, how fast did it go, etc?
1 comments

For local models are you wanting to do:

A) Embeddings.

B) Things like classification, structured outputs, image labelling etc.

C) Image generation.

D) LLM chatbot for answering questions, improving email drafts etc.

E) Agentic coding.

?

I have a MBP with M1 Max and 32GB RAM. I can run a 20GB mlx_vlm model like mlx-community/Qwen3.5-35B-A3B-4bit. But:

- it's not very fast

- the context window is small

- it's not useful for agentic coding

I asked "What was mary j blige's first album?" and it output 332 tokens (mostly reasoning) and the correct answer.

mlx_vlm reported:

  Prompt: 20 tokens @ 28.5 t/s | Generation: 332 tokens @ 56.0 t/s | Peak memory: 21.67 GB
Thanks for the info.

I’d like to do agentic coding first, but then chatbot and classification as lower priorities. I don’t really care about image gen.

Also, if you’re only able to run 35B models in 32GB, seems like I’d definitely want at least 64GB for the newer, larger models (qwen has a 122B model, right). My theory there is that models are only getting larger, though perhaps also more efficient.