Hacker News new | ask | show | jobs
by tempaccount420 1194 days ago
Doesn't this use as much VRAM as llama.cpp (with int4 models) uses RAM? RAM is a lot cheaper than VRAM.
1 comments

It won't run as fast on your CPU at it will run on a GPU. Also, it might clog most of your RAM; it's better to offload to a cheap GPU.