Y
Hacker News
new
|
ask
|
show
|
jobs
by
dminik
67 days ago
You can have multiple models served now with loading/unloading with just the server binary.
https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...
1 comments
speedgoose
67 days ago
It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.
link