| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dminik 67 days ago
	You can have multiple models served now with loading/unloading with just the server binary. https://github.com/ggml-org/llama.cpp/blob/master/tools/serv...

1 comments

It only lacks the automatic FIFO loading/unloading then. Maybe it will be there in a few weeks.