| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by geuis 900 days ago
	I have a 2020 16in MacBook Pro. I think it's the last generation of Intel chips. I've been struggling to get some of the LLM models like Mixtral to run on it. I hate the idea of needing to buy another $3k laptop less than 4 years after spending that much on my current machine. But if I want to get serious about developing non-chatgpt services, do I need a new M2 or M3 chip to get this stuff running locally?

9 comments

kiratp 900 days ago

We should be happy that compute is once again improving and machines are getting outdated rapidly. Which is better - a world where your laptop is competitive for 5+ years but everything stays the same? Or one where entire new realms of advancement open up every 18 months?

It’s a no contest option 2 for me.

Just use llama.cpp with any of the available UIs. It will be usable with 4 but quantization on CPU. You can use any of the “Q4_M” “GGUF” models that TheBloke puts out on Huggingface.

https://github.com/ggerganov/llama.cpp

UI projects in description.

https://huggingface.co/TheBloke

A closed source option is LMStudio.

https://lmstudio.ai/

elicksaur 900 days ago

“New realms of advancement” could open up because of faster computation algorithms. Those hypothetical scenarios don’t have to be mutually exclusive.

seanvelasco 900 days ago

i love this perspective! makes me really happy of the advancements going around, and not feel sad about my macbook m1 getting old

jey 900 days ago

I'd suggest using a cloud VM with a GPU attached. For normal stuff like LLM inference, I just rent an instance with a small (cheap) GPU. But when I need to do something more exotic like train an image model from scratch, I can temporarily spin up a cluster that has high-end expensive A100s. This way I don't have to invest in expensive hardware like an M3 that can still only do a small part of the full range.

elzbardico 900 days ago

You can do a lot with either a VM instance with a GPU or within google collab. If you are just starting and doing this stuff mostly a few hours a week, I'd recommend going that way for a while.

K0balt 900 days ago

If you want to run local, I’d get an m2 with 64gb of ram. That will enable you to run 30b models and mixtral 7bx8 . You need around 50gb to run those at 5/6 bit quant.

I’m getting about 20 tokens/second on my 64gb m2 mbp with mixtral 5-k-m gguf in llamacpp using text generation webui., 35? Layers being sent to metal for acceleration.

I’m really pleased with the performance compared to my dual 3090 desktop rig, the mbp is actually faster.

jwr 900 days ago

Data point: my MacBook Pro 16" with the M3 Max (64GB) runs 34b model inference about as fast (or slightly faster) as ChatGPT runs GPT-4.

I am now running phind-codellama:34b-v2-q8_0 through ollama and the experience is very good.

All that said, though, every model I tried couldn't hold a candle to GPT-4: they all produce crappy results, aren't good at translation, and can't really do much for me. They are toys, I go "ooh" and "aah" over them, then realize they aren't that useful and go back to using GPT-4.

Perhaps 34B is still not enough to get anything resonable.

iepathos 900 days ago

ollamma https://ollama.ai/ is popular choice for running local llm models and should work fine on intel. It's just wrapping docker so shouldn't require m2/m3.

smoldesu 900 days ago

On your CPU, you should be able to leverage the same AVX acceleration used on Linux and Windows machines. It's not going to make any GPU owners envious, but it might be enough to keep you satisfied with your current hardware.

ace2358 900 days ago

AVX code on laptop cooling sounds like it could be even slower! I don’t miss the heat from an intel laptop!

smoldesu 900 days ago

It runs faster and cooler than the software-accelerated alternative. Probably cooler than my 3070 too, my laptop sat ~50c when using AVX to generate Stable Diffusion Turbo images.

j45 900 days ago

An external thunderbolt gpu should work with an Intel MacBook Pro

muricula 900 days ago

Does your mac support an external GPU? A mid to high end nvidia card may or may not outperform the M3 GPU at a lower or similar price. You can also stick it in a PC or resell it separately.

K0balt 900 days ago

My 64gb m2 mbp is faster running inference than my dual 3090 desktop rig, and at 64g of unified memory it can hold slightly bigger models than the 48gb of vram of the desktop. The performance of the m2/m3 with a big unified memory is very impressive. Not much difference between m2/m3 though, if all other things are the same.

xfitm3 900 days ago

Do you recommend any specific external GPU? I had one from Black Magic, it was not that great performance wise.

kiratp 900 days ago

No Nvidia drivers for MacOS.

lights0123 900 days ago

Could dual boot Windows or Linux

fnordpiglet 900 days ago

eGPU isn’t supported on Apple silicon

sp332 900 days ago

As GP said, the early 2020 MBP had an Intel CPU.