Y
Hacker News
new
|
ask
|
show
|
jobs
by
yu3zhou4
106 days ago
I’m recreating a tiny version of vLLM in C++ and CUDA from scratch (high throughput LLM inference server)
https://github.com/jmaczan/tiny-vllm