| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by badmonster 456 days ago
	1000+ tokens/sec on H100s, a 5–10x speedup over typical autoregressive models — and without needing exotic hardware like Groq or Cerebras - impressive

1 comments

Would batch inference increase throughput further? Or does it already peak the FLOPS?