| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by kortex 1353 days ago

I've used both extensively. Git-lfs has always been a nightmare. Because each tracked large file can be in one of two states - binary, or "pointer" - it's super easy for the folder to get all fouled up. It would be unable to "clean" or "smudge", since either would cause some conflict. If you accidentally pushed in the wrong state, you could "infect" the remote and be really hosed. I had this happen numerous times over about 2 years of using lfs, and each time the only solution was some aggressive rewriting of history.

That, combined with the nature of re-using the same filename for the metadata files, meant that it was common for folks to commit the binary and push it. Again, lots of history rewriting to get git sizes back down.

Maybe there exist solutions to my problems but I had spent hours wrestling with it trying to fix these bad states, and it caused me much distress.

Also configuring the backing store was generally more painful, especially if you needed >2GB.

DVC was easy to use from the first moment. The separate meta files meant that it can't get into mixed clean/smudge states. If you aren't in a cloud workflow already, the backing store was a bit tricky, but even without AWS I made it work.

1 comments

adhocmobility 1352 days ago

We resolve this in two ways

1. All git-lfs files are kept in the same folder

2. No one can directly push commits to one of the main branches, they need to raise a PR. This means that commits go through review and its easy to tell if they've accidentally commit a binary, and we can just delete their branch form the remote bringing the size back down.

link