Hacker News new | ask | show | jobs
by selfhoster11 1022 days ago
6-bit quantizations are supposed to be nearly equivalent to 8-bit, and that does chop 1.5 GB off the model size. I think a 6-bit model should therefore fit, or if that doesn't, 5-bit medium or 5-bit small surely will.

There is always an option to go down the list of available quantizations notch by notch until you find the largest model that works. llama.cpp offers a lot of flexibility in that regard.