| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tric 1172 days ago
	Why is there so much focus on running GPT models on Mac OS? Is there something special about Apple's new chip, or Mac OS?

5 comments

19h 1172 days ago

Unified memory allows both CPU and GPU to use the same memory, effectively giving a MacBook with 96GB of memory 96GB of VRAM (minus OS overhead obv).

link

wmf 1172 days ago

Apple's unified memory should allow running large models like 65B that will not fit on a consumer GPU, but mostly I see people talking about the smaller 7B sizes that can run anywhere.

link

matwood 1172 days ago

The shared ram and neural engine make for an interesting/powerful platform if people are willing to port to it.

link

steve_adams_86 1172 days ago

Are the neural engines able to be leveraged by 3rd parties yet? I thought there was no API available yet.

link

rnosov 1172 days ago

They are leveraging Apple’s Metal Performance Shaders[1] not the neural engine. From the chart, it looks like you might get ~20x max boost on inference over plain CPU. Obviously, it's not like having RTX 4090 but better than nothing.

[1] https://pytorch.org/blog/introducing-accelerated-pytorch-tra...

CoreML is the API.

> Why is there so much focus on running GPT models on Mac OS?

Because a MacBook with 96GB of RAM is cheaper than a GPU with anything close to that.

link

rnk 1171 days ago

So the question is how much ram do you need? You and another person mentioned 96gb, the person below says he can run it with 24gb. What's the proper amount that is the best amount of ram for now? Of course 128gb/max is the best, but what's a great amount to have now. I never bought an m1, thinking of buying one now ;-)

link

nickthegreek 1172 days ago

I can run the 30b 4bit model on my m2 air that has 24gb of ram.

link

jpp4e 1170 days ago

Hi nickthegreek!

Could you tell me how you did that? Did you use FastChat or something else? Which model to download? What command to run?

Thank you!!!

link

nickthegreek 1169 days ago

https://huggingface.co/Pi3141/alpaca-lora-30B-ggml

I believe I’m using alpaca.cpp with a command:

./chat -m <bin filename>

link