|
|
|
|
|
by ossianericson
54 days ago
|
|
Wondering if the verifier accounts for load induced variance, not just weight fidelity. Working on something recently where we sidestepped runtime inference entirely. Partly because even with identical weights, prefix caching and continuous batching can shift outputs enough on long-horizon tasks that you don't really know what you're measuring. A provider can pass on a quiet cluster and fail at 80% GPU utilization without touching the checkpoint. |
|