Hacker News new | ask | show | jobs
by amelius 142 days ago
> loading them with torch.load() can lead to RCE (remote command execution)

Why didn't the Torch team fix this?

2 comments

OP misunderstands, the issue is specifically with the pickle format, and similar ones, as they're essentially code that needs to be executed, not just data to be loaded. Most of the ecosystem have already moved to using .safetensor format which is just data and doesn't suffer from that issue.
Safetensors solves RCE, but it doesn't solve legal liability. I scan .safetensors because metadata headers often contain restrictive licenses (like CC-BY-NC) that contradict the repo's README. Deploying a non-commercial model in a commercial SaaS is a security/compliance incident, even if no code is executed (PS I'm in the EU and it's important for us).

Additionally, a massive portion of the ecosystem is still stuck on Pickle/PyTorch .bin.

Right, but in these environments (PS, I'm also in the EU, also work in the ecosystem) we don't just deploy 3rd party data willy nilly, you take some sort of ownership of the data, review+polish and then you deploy that. Since security and compliance is important for you, I'm assuming you're doing the same?

And when you're doing that, you have plenty of opportunity to turn Pickle into whatever format you want, since you're holding and owning the data anyways.

Don't you suppose that in a large company with teams of 50+ devs/DS pulling models for experiments, enforcing a manual "review+polish+convert" workflow for every single artifact can create a massive bottleneck and, as a result, shadow IT? Doesn't it make sense to automate the "review" part?
If you run teams with 50+ devs then you MUST ensure the pipelines actually work, for every single project they work on, you don't PATCH validation on top of what seems to already be brittle in your infrastructure.

But I don't manage the infrastructure where you work, I don't have the full picture. But it sounds to me like there is a different issue going on, the issue isn't "Some HF repos use Git LFS so we need a tool to flag those".

PyTorch relies on Python's pickle module for serialization, which is essentially a stack-based virtual machine. This allows for saving arbitrary Python objects, custom classes, etc., but the trade-off is security. The PyTorch docs explicitly say: "Only load data you trust."

"torch.load() unless weights_only parameter is set to True, uses pickle module implicitly, which is known to be insecure. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling. Never load data that could have come from an untrusted source in an unsafe mode, or that could have been tampered with. Only load data you trust. — PyTorch Docs"

In the real world, some people might download weights from third-party sources. Since PyTorch won't sandbox the loading process, I did the tool to inspect the bytecode before execution.