Hacker News new | ask | show | jobs
by noman-land 806 days ago
My first thought was how much RAM? Will it work on 64GB M1?
2 comments

It is ~260GB with presumably fp16 weights. Should fit into 64GB at 3-bit quantization (~49GB).

Edit: To add to this, I've had good luck getting solid output out of mixtral 8x7b at 3-bit, so that isn't small enough to completely kill the model's quality.

I wonder, can you quantize it yourself with some tool?
Thanks!!
Nope. Just the weights would take 88GB at 4 bit. 128GB MBP ought to be able to run it. If I were to guess, a version for Apple MLX should be available within a few days, for those of us fortunate enough to own such a thing.
It’s already available. I had it running yesterday morning in an M3 MAX 128GB. I get about 6tps.

https://www.reddit.com/r/LocalLLaMA/s/MSsrqWHYga