Y
Hacker News
new
|
ask
|
show
|
jobs
by
iAkashPaul
660 days ago
Pretty sure this was never questioned for batched requests, sg-lang/lmdeploy/tensorRT-LLM will have nearly twice as reported speeds with INT8 (fp16 A100 benched here
https://github.com/sgl-project/sglang?tab=readme-ov-file#ben...
)