Hacker News new | ask | show | jobs
by mrtranscendence 1158 days ago
I very recently purchased a MacBook Pro (M1 Max) with 64GB of ram. I haven't experimented that much, but I was able to run inference using the 65B parameter Llama model with quantized weights at a speed that was reasonably usable (maybe a touch slower than ChatGPT with GPT-4).

I haven't attempted to use the 65B model with non-quantized weights, but the smaller models work that way, if slowly. With 96GB of ram -- the upper limit of a MacBook Pro -- you might be able to use even larger models, but I think you'd hit the limits of useful performance before that point.

I should note that it can be a bit tricky getting things to work using the Mac's GPU. I couldn't get Dolly 6B to run on my work MBP, which theoretically should have enough ram, though I still want to try it on my personal laptop.

1 comments

I see refurbished m1 2tb/128gb for $4700, looks like similar price for an m2 with same storage/ram with my corp discount (20cpu/48gpu). This is a tough decision.