| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bildung 160 days ago
	vLLM ususally only plays out its strength when serving multiple users in parallel, in contrast to llama.cpp (Ollama is a wrapper around llama.cpp). If you want more performance, you could try running llama.cpp directly or use the prebuilt lemonade nightlies.

1 comments

But vLLM was half the t/s of Ollama, so something was obviously not ok.