Y
Hacker News
new
|
ask
|
show
|
jobs
by
nailk
1021 days ago
Looks great! Are there other benchmarks? How does the speed compare to other LLM engines like llama.cpp / vllm (on GPUs)? Is it able to do continuous batching of incoming requests like vllm?