Hacker News new | ask | show | jobs
by tric 1172 days ago
Why is there so much focus on running GPT models on Mac OS? Is there something special about Apple's new chip, or Mac OS?
5 comments

Unified memory allows both CPU and GPU to use the same memory, effectively giving a MacBook with 96GB of memory 96GB of VRAM (minus OS overhead obv).
Apple's unified memory should allow running large models like 65B that will not fit on a consumer GPU, but mostly I see people talking about the smaller 7B sizes that can run anywhere.
The shared ram and neural engine make for an interesting/powerful platform if people are willing to port to it.
Are the neural engines able to be leveraged by 3rd parties yet? I thought there was no API available yet.
They are leveraging Apple’s Metal Performance Shaders[1] not the neural engine. From the chart, it looks like you might get ~20x max boost on inference over plain CPU. Obviously, it's not like having RTX 4090 but better than nothing.

[1] https://pytorch.org/blog/introducing-accelerated-pytorch-tra...

CoreML is the API.
> Why is there so much focus on running GPT models on Mac OS?

Because a MacBook with 96GB of RAM is cheaper than a GPU with anything close to that.

So the question is how much ram do you need? You and another person mentioned 96gb, the person below says he can run it with 24gb. What's the proper amount that is the best amount of ram for now? Of course 128gb/max is the best, but what's a great amount to have now. I never bought an m1, thinking of buying one now ;-)
I can run the 30b 4bit model on my m2 air that has 24gb of ram.
Hi nickthegreek!

Could you tell me how you did that? Did you use FastChat or something else? Which model to download? What command to run?

Thank you!!!

https://huggingface.co/Pi3141/alpaca-lora-30B-ggml

I believe I’m using alpaca.cpp with a command:

./chat -m <bin filename>