Hacker News new | ask | show | jobs
by geuis 900 days ago
I have a 2020 16in MacBook Pro. I think it's the last generation of Intel chips. I've been struggling to get some of the LLM models like Mixtral to run on it.

I hate the idea of needing to buy another $3k laptop less than 4 years after spending that much on my current machine. But if I want to get serious about developing non-chatgpt services, do I need a new M2 or M3 chip to get this stuff running locally?

9 comments

We should be happy that compute is once again improving and machines are getting outdated rapidly. Which is better - a world where your laptop is competitive for 5+ years but everything stays the same? Or one where entire new realms of advancement open up every 18 months?

It’s a no contest option 2 for me.

Just use llama.cpp with any of the available UIs. It will be usable with 4 but quantization on CPU. You can use any of the “Q4_M” “GGUF” models that TheBloke puts out on Huggingface.

https://github.com/ggerganov/llama.cpp

UI projects in description.

https://huggingface.co/TheBloke

A closed source option is LMStudio.

https://lmstudio.ai/

“New realms of advancement” could open up because of faster computation algorithms. Those hypothetical scenarios don’t have to be mutually exclusive.
i love this perspective! makes me really happy of the advancements going around, and not feel sad about my macbook m1 getting old
I'd suggest using a cloud VM with a GPU attached. For normal stuff like LLM inference, I just rent an instance with a small (cheap) GPU. But when I need to do something more exotic like train an image model from scratch, I can temporarily spin up a cluster that has high-end expensive A100s. This way I don't have to invest in expensive hardware like an M3 that can still only do a small part of the full range.
You can do a lot with either a VM instance with a GPU or within google collab. If you are just starting and doing this stuff mostly a few hours a week, I'd recommend going that way for a while.
If you want to run local, I’d get an m2 with 64gb of ram. That will enable you to run 30b models and mixtral 7bx8 . You need around 50gb to run those at 5/6 bit quant.

I’m getting about 20 tokens/second on my 64gb m2 mbp with mixtral 5-k-m gguf in llamacpp using text generation webui., 35? Layers being sent to metal for acceleration.

I’m really pleased with the performance compared to my dual 3090 desktop rig, the mbp is actually faster.

Data point: my MacBook Pro 16" with the M3 Max (64GB) runs 34b model inference about as fast (or slightly faster) as ChatGPT runs GPT-4.

I am now running phind-codellama:34b-v2-q8_0 through ollama and the experience is very good.

All that said, though, every model I tried couldn't hold a candle to GPT-4: they all produce crappy results, aren't good at translation, and can't really do much for me. They are toys, I go "ooh" and "aah" over them, then realize they aren't that useful and go back to using GPT-4.

Perhaps 34B is still not enough to get anything resonable.

ollamma https://ollama.ai/ is popular choice for running local llm models and should work fine on intel. It's just wrapping docker so shouldn't require m2/m3.
On your CPU, you should be able to leverage the same AVX acceleration used on Linux and Windows machines. It's not going to make any GPU owners envious, but it might be enough to keep you satisfied with your current hardware.
AVX code on laptop cooling sounds like it could be even slower! I don’t miss the heat from an intel laptop!
It runs faster and cooler than the software-accelerated alternative. Probably cooler than my 3070 too, my laptop sat ~50c when using AVX to generate Stable Diffusion Turbo images.
An external thunderbolt gpu should work with an Intel MacBook Pro
Does your mac support an external GPU? A mid to high end nvidia card may or may not outperform the M3 GPU at a lower or similar price. You can also stick it in a PC or resell it separately.
My 64gb m2 mbp is faster running inference than my dual 3090 desktop rig, and at 64g of unified memory it can hold slightly bigger models than the 48gb of vram of the desktop. The performance of the m2/m3 with a big unified memory is very impressive. Not much difference between m2/m3 though, if all other things are the same.
Do you recommend any specific external GPU? I had one from Black Magic, it was not that great performance wise.
No Nvidia drivers for MacOS.
Could dual boot Windows or Linux
eGPU isn’t supported on Apple silicon
As GP said, the early 2020 MBP had an Intel CPU.