| HN Mirror

In general you can just use the parameter count to figure that out.

70B model at 8 bits per parameter would mean 70GB, 4 bits is 35GB, etc. But that is just for the raw weights, you also need some ram to store the data that is passing through the model and the OS eats up some, so add about a 10-15% buffer on top of that to make sure you're good.

Also the quality falls off pretty quick once you start quantizing below 4-bit so be careful with that, but at 3-bit a 70B model should run fine on 32GB of ram.