| HN Mirror

The 20% is a safety margin on the memory fit check only. it sits on top of the raw weights-only figure (params × bytes-per-precision) to account for KV cache and activation tensors, not framework differences specifically. Your point is valid but i think it applies to a different layer. PyTorch vs ONNX overhead is real, but it's implicitly captured in the throughput path. Tier 2 scales from real-world benchmarks that already reflect whatever framework ran them. The 20% is intentionally conservative: it'll occasionally say a model won't fit when it technically could, but it won't tell you something fits and then OOM you.