Hacker News new | ask | show | jobs
by dragonwriter 813 days ago
Quantization and CPU mode and hybrid mode where the model is split between CPU and GPU exist and work well for LLMs, but in the end more VRAM is a massive quality of life improvement for running (and probably more for training, which has higher RAM needs and forbwhich quantization isn't useful, AFAIK) them, even ifbyou technically can do them on CPU alone or hybrid with no/lower VRAM requirements.