SGLang: Fast and Expressive LLM Inference with RadixAttention for 5x Throughput

Y	Hacker News new \| ask \| show \| jobs

	SGLang: Fast and Expressive LLM Inference with RadixAttention for 5x Throughput (github.com)
	2 points by covi 853 days ago