| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by arseniibr 142 days ago
	In an ideal local environment with a properly configured git client, sure. But in real-world CI/CD pipelines, people can use wget, curl, or custom caching layers that often pull the raw pointer file instead of the LFS blob. When that hits torch.load() in production, the service crashes. The tool was designed to catch this integrity mismatch before deployment.

1 comments

embedding-shape 141 days ago

Right, but if your CI/CD pipeline is fetching repositories that are using Git LFS while whatever pipeline you're creating/maintaining can't actually handle Git LFS, wouldn't you say that it's the pipeline that would have to be fixed?

Trying to patch your CI builds by adding a tool that scans for licenses, "malware" and other metadata errors on top of all of this feels very much like "the wrong solution", fix the issue at the root instead, the pipeline doing the wrong things.

link

arseniibr 141 days ago

I agree that fixing the pipeline is indeed the correct decision, but I've created this tool to provide the detection.

In a complex environment, you often don't control the upstream ingestion methods used by every team. They might use git lfs, wget, huggingface-cli, or custom caching layers.

Relying solely on the hope that every downstream consumer correctly handles Git LFS is dangerous. This tool acts as a detector to catch those inevitable human or tooling errors before they crash the production.

link

embedding-shape 140 days ago

> This tool acts as a detector to catch those inevitable human or tooling errors before they crash the production.

Again, that sounds like a bigger issue, that a repository using Git LFS can somehow "crash the production", that's where I'd add resilience first. But as mentioned in another comment, I don't have the full view of your infrastructure, maybe it has to work like that for whatever reason, so YMMV.

link