| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by evnc 928 days ago
	Have you looked into quantization? At 8-bit quantization, a 7B model requires ~7GB of RAM (plus a bit of overhead); at 4-bit, it would require around 3.5GB and fit entirely into the RAM you have. Quality of generation does degrade a bit the smaller you quantize, but not as much as you may think.

1 comments

This is interesting; I've written how I set it up here; https://christiaanse.ca/posts/running_llm/