Hacker News new | ask | show | jobs
by e12e 1099 days ago
But (in theory) - llama.cpp could implement similar approach to paging/memory and see a speedup for 4bit models on cpu?