Hacker News new | ask | show | jobs
by omneity 149 days ago
Except this is GLM 4.7 Flash which has 32B total params, 3B active. It should fit with a decent context window of 40k or so in 20GB of ram at 4b weights quantization and you can save even more by quantizing the activations and KV cache to 8bit.
1 comments

yes, but the parrent link was to the big glm 4.7 that had a bunch of ggufs, the new one at the point of posting did not, nor does it now. im waiting for unsloth guys for the 4.7 flash