Hacker News new | ask | show | jobs
by irusensei 876 days ago
Why not both? Llama.cpp allows layering GGUF models between GPU and CPU memory.