| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by steren 317 days ago
	> I would never want to use something like ollama in a production setting. We benchmarked vLLM and Ollama on both startup time and tokens per seconds. Ollama comes at the top. We hope to be able to publish these results soon.

3 comments

ekianjo 317 days ago

you need to benchmark against llama.cpp as well.

link

apitman 317 days ago

Did you test multi-user cases?

link

jasonjmcghee 317 days ago

Assuming this is equivalent to parallel sessions, I would hope so, this is like the entire point of vLLM

link

sbinnee 317 days ago

vllm and ollama assume different settings and hardware. Vllm backed by the paged attention expect a lot of requests from multiple users whereas ollama is usually for single user on a local machine.

link