Hacker News new | ask | show | jobs
by arseniibr 142 days ago
Safetensors solves RCE, but it doesn't solve legal liability. I scan .safetensors because metadata headers often contain restrictive licenses (like CC-BY-NC) that contradict the repo's README. Deploying a non-commercial model in a commercial SaaS is a security/compliance incident, even if no code is executed (PS I'm in the EU and it's important for us).

Additionally, a massive portion of the ecosystem is still stuck on Pickle/PyTorch .bin.

1 comments

Right, but in these environments (PS, I'm also in the EU, also work in the ecosystem) we don't just deploy 3rd party data willy nilly, you take some sort of ownership of the data, review+polish and then you deploy that. Since security and compliance is important for you, I'm assuming you're doing the same?

And when you're doing that, you have plenty of opportunity to turn Pickle into whatever format you want, since you're holding and owning the data anyways.

Don't you suppose that in a large company with teams of 50+ devs/DS pulling models for experiments, enforcing a manual "review+polish+convert" workflow for every single artifact can create a massive bottleneck and, as a result, shadow IT? Doesn't it make sense to automate the "review" part?
If you run teams with 50+ devs then you MUST ensure the pipelines actually work, for every single project they work on, you don't PATCH validation on top of what seems to already be brittle in your infrastructure.

But I don't manage the infrastructure where you work, I don't have the full picture. But it sounds to me like there is a different issue going on, the issue isn't "Some HF repos use Git LFS so we need a tool to flag those".