Y
Hacker News
new
|
ask
|
show
|
jobs
by
badmonster
408 days ago
1000+ tokens/sec on H100s, a 5–10x speedup over typical autoregressive models — and without needing exotic hardware like Groq or Cerebras - impressive
1 comments
lostmsu
408 days ago
Would batch inference increase throughput further? Or does it already peak the FLOPS?
link