Hacker News new | ask | show | jobs
by derefr 2286 days ago
Has anyone used Git submodules to isolate large binary assets into their own repos? Seems like the obvious solution to me. You already get fine-grained control over which submodules you initialize. And, unlike Git LFS, it might be something you’re already using for other reasons.
5 comments

Using submodules require that everyone on your team has at least a vague idea of what's going on and how to not foot-gun themselves. That's hard enough with git itself. I don't think I've ever seen submodules used without become a major pain point.
That is a straight up nonstarter.

Someone was trying to talk me into git subtrees though...

The problem with git submodules is they can't be used like a hyperlink to another repository. Updating the submodule requires updating the superproject as well. The new commits are invisible to the superproject until that is done.

It'd be great if they worked like Python's editable package installations.

Then the state of the superproject would depend on when the checkout occurred. That would be disastrous for consistency, you’d be unable to replicate a checkout later or elsewhere. The state of a repo after a checkout should only depend on the commit that was checked out.
It’s interesting that we’ve never developed the equivalent for Git of what every programming-language ecosystem has: keeping two parallel listings of dependencies, one in terms of version constraints to satisfy, and the other in terms of exact refs.

I could totally see a .gitmodules.reqs file specified in terms of semver specs against tags, or just listing a branch to check out the HEAD of; resolving to the same .gitmodules file we already have. Not even a breaking change!

It would mean attaching a semantic meaning to tags, but git doesn't do that, ever, for any reference. You don't even have to have a master branch, much less tags that follow semver. Linux doesn't even use semver!
Correct, this feature should be built on top of the source control system, not as part of it.
This is great for vendoring external dependencies that aren't under the developer's control. When the same developer is working on several related but separate projects at the same time, it's too cumbersome.

Would be nice if git submodules could also point to a branch instead of specific commits. That way, the superproject's state would not be modified every time the branch is updated.

If your software is small enough that a small handful for engineers can keep the whole thing in their head, you probably don't need submodules. If you have different teams working on different parts of the project, submodules start making more sense.
They can now, with the new-ish submodule update/init --remote. But the problem with sub modules is that you cannot do a shallow fetch (depth 1) because most hosts won’t serve unadvertised refs.
Submodules are almost always the wrong answer. If you need to version huge files, use Git LFS.
I have tried sub modules but it’s way too easy to shoot yourself in the foot. Not very sustainable in a team with different levels of git knowledge.
I’ve done that. Especially if you want specific versions of data to build ML models, this makes a nice audit log for reproducibility