| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by iAkashPaul 660 days ago
	Pretty sure this was never questioned for batched requests, sg-lang/lmdeploy/tensorRT-LLM will have nearly twice as reported speeds with INT8 (fp16 A100 benched here https://github.com/sgl-project/sglang?tab=readme-ov-file#ben...)