Thank you Josh. Is there a resource you can point us too that helps answer "what kind of MacBook pro memory do I need to run ABC model at XYZ quantization?"
In general you can just use the parameter count to figure that out.
70B model at 8 bits per parameter would mean 70GB, 4 bits is 35GB, etc. But that is just for the raw weights, you also need some ram to store the data that is passing through the model and the OS eats up some, so add about a 10-15% buffer on top of that to make sure you're good.
Also the quality falls off pretty quick once you start quantizing below 4-bit so be careful with that, but at 3-bit a 70B model should run fine on 32GB of ram.
70B model at 8 bits per parameter would mean 70GB, 4 bits is 35GB, etc. But that is just for the raw weights, you also need some ram to store the data that is passing through the model and the OS eats up some, so add about a 10-15% buffer on top of that to make sure you're good.
Also the quality falls off pretty quick once you start quantizing below 4-bit so be careful with that, but at 3-bit a 70B model should run fine on 32GB of ram.