| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by nailk 1021 days ago
	Looks great! Are there other benchmarks? How does the speed compare to other LLM engines like llama.cpp / vllm (on GPUs)? Is it able to do continuous batching of incoming requests like vllm?