Hacker News new | ask | show | jobs
by duffyjp 779 days ago
FYI for the Mac Mini idea, I have an M1 Macbook Pro with 32gb. There's some sort of limitation on how much ram can be allocated to the GPU. Trying to run even a 22gb ram model will fail. The best I've gotten is Code Llama 34B 3-bit at 18.8gb. There can be tons of RAM still empty but the LLM will just infinite loop dropping a chunk of RAM and reloading from disk.
1 comments

Yes, Metal seems to allow a maximum of 1/2 of the RAM for one process, and 3/4 of the RAM allocated to the GPU overall. There’s a kernel hack to fix it, but that comes with the usual system integrity caveats. https://github.com/ggerganov/llama.cpp/discussions/2182