| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bmc7505 123 days ago
	17k TPS is slow compared to other probabilistic models. It was possible to hit ~10-20 million TPS decades ago with n-gram and PDFA models, without custom silicon. A more informative KPI would be Pass@k on a downstream reasoning task - for many such benchmarks, increasing token throughput by several orders of magnitude does not even move the needle on sample efficiency.