Hacker News new | ask | show | jobs
by evnc 928 days ago
Have you looked into quantization? At 8-bit quantization, a 7B model requires ~7GB of RAM (plus a bit of overhead); at 4-bit, it would require around 3.5GB and fit entirely into the RAM you have. Quality of generation does degrade a bit the smaller you quantize, but not as much as you may think.
1 comments

This is interesting; I've written how I set it up here; https://christiaanse.ca/posts/running_llm/