Hacker News new | ask | show | jobs
by nailk 1021 days ago
Looks great! Are there other benchmarks? How does the speed compare to other LLM engines like llama.cpp / vllm (on GPUs)? Is it able to do continuous batching of incoming requests like vllm?