Hacker News new | ask | show | jobs
by rahimnathwani 508 days ago
IIRC it makes things a little easier, e.g. you don't need to specify a ClI flag to set how many layers to offload to GPU, and it provides an API that other programs on your system can use (e.g. openwebui).

It's been a while since I used llama.cpp directly, and I don't know whether I'm correct about its current scope.