| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ossianericson 101 days ago
	Wondering if the verifier accounts for load induced variance, not just weight fidelity. Working on something recently where we sidestepped runtime inference entirely. Partly because even with identical weights, prefix caching and continuous batching can shift outputs enough on long-horizon tasks that you don't really know what you're measuring. A provider can pass on a quiet cluster and fail at 80% GPU utilization without touching the checkpoint.