Hacker News new | ask | show | jobs
by EnPissant 279 days ago
llama.cpp has support for running some of or all of the layers on the CPU. It does not swap them into the GPU as needed.