Hacker News new | ask | show | jobs
SGLang: Fast and Expressive LLM Inference with RadixAttention for 5x Throughput (github.com)
2 points by covi 853 days ago