| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SushiHippie 941 days ago

As the model is very small you should be able to run any quantization level on a M-Series macbook with at least 16GB of ram. The best one speed/quality wise will probably be Q6_K. As it has not much difference in quality with Q8, but will be definitely faster than Q8.

Haven't tried this one specifically but I always run the 7B parameter models on a M2 Pro with Q6_K or Q4_K_M (depending on how fast I want it).

See also this table in the readme, which states that Q8 only needs ~10GB of RAM: https://huggingface.co/TheBloke/MonadGPT-GGUF?text=Hey+my+na...