Clickbait title. The author literally says "Spoiler alert. I do not hate submodules." and also later recommends submodules for language ecosystems that don't have built-in package managers.
The author also gives (IMO) 2 weak reasons against submodules.
(1) He says its hard to know which repo you're editing (main repo or submodule). I agree, but in practice this hasn't been an issue for me. A simple `git status` or `pwd` is usually enough to know which repo I'm editing.
(2) The author also says that committing changes with submodules can be confusing since it involves multiple commits: one commit in the submodule, and another in the main repo to update the commit it points to. I agree this is a little confusing at first and definitely tedious, but conceptually I think it is pretty simple.
That said, I do agree that submodules are confusing -- just for different reasons.
My main gripe with submodules are that they don't work well with the rest of git. Why isn't adding a submodule just a `git add` to a directory with a git repo inside of it? If there are new commits in a submodule, why doesn't `git checkout .` reset it back to the commit the main repo points to? If I clone a repo with submodules, why do I need to run additional submodule commands to get an exact copy of the codebase? Basically, to me it feels like submodules were slapped onto git as an afterthought and little care was taken to think the git CLI experience as a whole. I think submodules would be a lot less confusing if git had designed a better CLI for it.
Clickbait title, but still I do hate them for all the reasons you do state - something that should be somewhat transparent and frankly nice is instead a huge boring PITA.
For any project where I could choose not to use them, I choose not to use them.
I've been the git fixer for a few different teams. I want to like submodules, but there's something that doesn't fit my brain the way the rest of git does. It feels half-baked. I think we're still missing the best way to model the problem as a tree of related states.
Agreed. I also want to like submodules but every encounter with them has been an awkward confusing mess.
They're fine if they are completely hidden from human interaction and used in a programmatic way, but the human interface with the concept seems to always be so awkward.
I agree with a half baked UI. It feels like they kept on tacking on more and more commands and flags to get to where we are today. But the fundamental data structures the UI operates on are rock solid though. I'm actually surprised more software doesn't use git's internals for things besides version control.
This post is really hard to read. It mentions git submodules and then sharply veers into a bunch of seemingly barely-related paragraphs. If each section had one sentence at the top saying something about git submodules it would be a lot more coherent.
> It's like it's purposefully obtuse for no good reason.
At least it's consistent with the rest of git then...
Git is one of those tools I just can't muster the willpower to truly learn. I use SourceTree and hope for the best, and search the web frantically when something weird happens.
I used Mercurial from only the command line for many years, never felt I needed a GUI. But git, there's just something about it.
My common advice is: bite the bullet and learn it. Specifically, delve deeper past the porcelain commands and understand what they do. Also, for a while I stopped using “git pull” and used “git fetch” and manually managed merges and branch tracking (a lot of “git reset —-hard origin/branch-name”). Once you get it, you will be a lot more productive. You’ll still have to deal with obtuse commands from time to time, but you’ll have a better model of what they should do and can check if they are doing it right.
The Git command line interface is confusing, but Git's data structure design — a content addressable store — is relatively simple (and very interesting!).
I went through the book Pro Git three times leading three different training groups, and I was pretty comfortable with it. But what took me to the next level was prepping a talk for Papers We Love San Diego which explores everything that happens inside the `.git` directory when you initialize a repo and perform a couple basic commits: https://www.youtube.com/watch?v=fHSZz_Mx-Uo
I guess part of the issue. I just don't interact with git enough that it matters much.
I might spend half an hour every now and then trying to figure out right incantation for this specific issue, but overall SourceTree just tells git what it needs to hear and I can carry on with what I really want: code.
Sure, but I might argue that you'll spend _even less time_, in aggregate, with more productivity if you understand the tool better. This is particularly useful when, for example, you're good at saving your peers time by properly rebasing and/or cherry-picking your commits.
That doesn't really make sense to me. `git submodule` is for managing (adding, removing, updating) the submodule, not for doing any random git operation on the submodule.
I understand pull in this context would be a submodule command, not a git command, but why use different terminology from git for what is the same idea?
I routinely encounter problems where the file structure on disk gets out of sync with the .gitmodules file, or one of the internal files inside .git, and I need to completely recreate my local repo.
I dislike having to warn in the README file, "somewhere in this repo are submodules, and you need to use this incantation to clone them."
I think you can now have the submodule reference point to a branch but it can't point to a tag, which is what I want most of the time.
And there's how `git status` just reports "modified content" without going into details.
I've been having decent luck managing my dotfiles' dependencies with `git vendor`, which is a nice porcelain around subtrees. The big win is that it keeps track of your subtrees for you and presents a sufficiently-simple interface for adding, removing, and updating vendored dependencies: `git vendor add <name> <repo URI> <path>` to create, `git vendor update <name> <commit-spec>` to update, `git vendor list` to see what you have installed and where. I haven't yet had to do any dev work on my vendored deps, but I suspect it wouldn't be much more difficult than anything else I've done with it.
That sounds promising, but are we talking about https://github.com/brettlangdon/git-vendor ? It looks like it's been abandoned since 2016 and none of the forks on GitHub look like they have much traction. This looks even less like something I'd want to base my workflow off of than git-subrepo.
Ugh, hadn't noticed the author went AWOL. Yeah, that's it. I've been using it without apparent issue at least that long. Now I'm not sure if I'd recommend using it, agreed. On the one hand, unmaintained software is always sketchy. On the other hand, it's actually only like two hundred lines of bash, so worst-case you just fork it and pretend you wrote a custom script to manage your subtrees like everyone else does? :/
I guess my concern about subtrees is that they seem like they can result in unnecessary duplication / use of disk space. But the ability to work offline with a complete copy of the code seems worth the tradeoff in most cases, and git-lfs should help deal with large file sizes.
Have you run into any "slow push speeds" with subtrees, as the person complains about in the first article I linked?
No, I haven't run into any such issue, though I have only used it with small to medium sized repositories. The repo simply grows linearly with the size of the secondary repository. With a submodule, you would have to clone it just the same. In fact, you might enjoy some benefits from cross-compression by using subtrees that would not be available to submodules, and it's faster to reuse the same clone connection you were already getting the source from.
I tried subrepo and ran into some dumb error right away. I went with subtree and it worked well, but I needed to write a script for others to use it to update (but the system already had one, just didn't work well). I just can't say it is okay to not have a script like nobody has a script for git pull.
My concern with subrepo is that I have no idea who is developing it and how much resources they have. I'd hate to learn to use a tool central to my development workflow and then have it discontinued.
Is git-subrepo good enough that it's worth the risk of using something not built into git (and the hassle of installing something extra)?
For a minute I confused this with another external tool: git-filter-repo [0]. It's recommended by the official manual as replacement for git-filter-branch [1].
I use them (occasionally), but only when the pain of not using them would be greater, because they are a blue-assed bitch.
But they really do enforce a very strict version control. If we want to be absolutely sure of a version, Git submodules will give that to you.
But I only use them in one PHP backend project that I plan to barely ever change, because every change means that I have to crawl through a raft of repos, updating a submodule chain.
It's exactly the kind of operation that calls for a scripting solution, but it is also one of those projects, where I change it so seldom, that it isn't worth it to write a script.
For my frontend (Swift) work, I use SPM (Swift Package Manager). Much easier on my nerves.
I've been maintaining a git repo at my current and previous employer - both started out with submodules, and got axed due to added complexity and cognitive load. After removal, they were not missed. Case closed.
I can only speak for myself. In the roughly 12 years I've been using git at multiple different companies and for my own projects, I have never used git submodules. I feel like I have a good, well rounded set of experiences too. Code bases big and small, massive monorepos, and many microrepos all separate. In none of them have we used submodules.
That being said, maybe submodules would have solved some problem here or there that we had. I'm open to arguments in favor of them in that case. But we've always been able to get the work done without them, so I don't personally think they're indispensable.
The biggest pain point for me has been that they're basically incompatible with `git worktree`. By default they'll just be cloned from upstream in the new worktree, thereby defeating the purpose of using worktrees in the first place.
I used to have a bunch of hacky scripts for working around this, but lately I've just been giving up and avoiding submodules as much as possible.
I was hoping to see better solutions to this problem in the comments, but the paucity of a solid solution means the problem isn't solved yet. I agree with `qznc` below, in that once maturity is achieved, the monorepo should x-furcate and create packages, but up until that point there really is no clean solution for this relational concept.
It believe it is a question of maturity. If the subcomponent is mature enough, then turning it into a package is fine. The cost is that patching the subcomponent and testing the main component takes more effort (though you can script it). With submodules at least the build and test cycle is as quick as if it is in the same repo.
So there is a scale from quicker iteration to less coupled: same repo, submodule, different package. The question is in which cases the middle step "submodule" is worth it or if you should rather switch to one of the others always.
Exactly what I am thinking. My use case is to have a shared api repository containing an IDL (here protobuf/grpc) and you hook it up as submodules for iOS, android, Golang.
The round trip time is reduced compared to creating and publishing modules for each generated code.
This makes it far easier to experiment with a new api feature so one gets a better feeling how this behaves in each language.
And if the api is mature enough you might as well change your development flow into publishing modules instead of using git submodules.
Therefore I like submodules and would argue that people might underestimate the increased round trip time for publishing modules and then referencing them in code compared to using submodules
At least it is better than subtrees, which cannot tell you what they are even if you ask.
Also, ClearCase isn't atomic beyond file level even... you use labels to make "versions".
About the point of it not being obvious where you're working: I find using zsh or another shell that prints git information as part of the shell prompt helps immensely with this.
I don't see the point of submodules. If you want to make a change in another repo, do that. If you want to reference a specific version of another package in your code, do that. When would I want submodules?
I have a FOSS project[0] that uses submodules for two separate reasons:
In the first[1], it's to include the project's JS implementation in the project's website. Considering that this would be the only JS code running on the entire site, it seemed like overkill to throw in some newfangled asset manager like Bower just for a single NPM package.
In the second[2], it's to include the encoding/decoding test cases alongside the implementations. This way, instead of having to maintain a bunch of independent per-implementation unit tests, I can maintain all the tests in one place, and then have the per-implementation test suites snarf the test cases, and I then know with reasonable certainty that all my implementation libraries have equivalent behavior.
There are probably other, "better" ways to do both these things - I could bite the bullet and use Bower for the website, and I could have test suites download test cases on-the-fly - but submodules were the path of least resistance, and I've yet to encounter any significant downsides.
When you have library that is useful to more than one project, but not popular enough to have its own package on multiple package managers. Then the easiest way to reference a specific version of the library is to submodule it.
The main thing I learned is if you mess up any part of your submodule during creation - do not try to fix it. Just delete it from the parent repo and start over.
Also do not bother deleting it using git commands. Delete it in the .gitmodules file, then search your .git folder for every reference of the repo you want to delete (including folders named after it) and delete everything.
Either that or start with a clean parent repo clone.
> Spoiler alert. I do not hate submodules. I do how ever have an instant oh oh response when people mention they want to solve a problem with Git submodules.
If you can wrap your build in Nix, I highly recommend it. The upcoming "flakes" feature handles pinning Git-hosted dependencies and locking revisions, even if they aren't Nix flakes themselves.
If you get to a point where you think you absoultely need git submodules...just switch to svn, so much pain in misery can be avoided if you just use the right tool for the right job and SVN handles the git submodule use-case effortlessly (for C++ at least).
For languages with proper package management (ruby, python, go, node, etc...) put in the extra effort to utilize your package manager to update your dependencies instead of bothering with submodules. If you're still set on doing submodules, I'm willing to be you're just "doing it wrong" (TM).
Subversion is way easier to use than git, and it takes minutes to setup a server. Pushing your project history is just a few steps with git-svn, though it may take a while depending on the size of your project.
TortoiseSVN is probably the most straightforward and easy to use version control GUI there is.
My point is that it's easier to manage and maintain a single version control system.
> Subversion is way easier to use than git
That's subjective, I'm more familiar with git thus it's easier for me.
> and it takes minutes to setup a server.
Git doesn't even require a server to use. You can create local repos, you can pull/push to remote repos via SSH/HTTPS/etc. No specialized server software needed.
I also hate monorepos. I consider the monorepo to be an anti-pattern.
It's an architectural advantage to separate each module into a different repo as it encourages careful separation of concerns.
If you find that you often need to update many modules together every time you want to add a new feature to your project, this is often an indication that your modules do not have proper separation of concerns and your abstractions are leaking. It means your project exhibits low cohesion and/or tight coupling between modules.
The difficulty in maintaining separate module dependencies is actually a very useful signal to you as a developer that your code is too tightly coupled and needs to be refactored into modules which are more independent.
Monorepos are a bandaid patch solution which covers up the root problem. The real problem is incorrect separation of concerns, AKA low cohesion which leads to tight coupling between your components.
It's not possible to design simple interfaces between components when these components have overlapping responsibilities.
- in 2 repos -> 2 PRs -> 2 test suites -> 2 code reviews
- in a monorepo -> 1 PR -> 1 test suite -> 1 code review
When your project grows in complexity, there are some concerns that cross the boundaries of your repositories (CI/CD pipelines, testing and QA being a few examples).
Having a monorepo helps.
Consider having all your docker images and helm charts alongside the source code of the many parts of your big project. Is that really an anti-pattern?
EDIT: also a new dev arriving in the team, having to clone only one repository is easier for them. I also try to have a simple docker-compose stack so they have only one command to spin up the whole dev environment.
>> When your project grows in complexity, there are some concerns that cross the boundaries of your repositories
The notion of 'cross-cutting concerns' is also an anti-pattern. It's a violation of the 'separation of concerns' principle. A violation of the 'cross-cutting' kind, to be exact.
There are almost always better alternative solution which don't involve cross-cutting concerns but which require a slightly more carefully thought out architecture.
When it comes to testing, I agree that (for example) integration tests are extremely valuable but I disagree that having the source code of your dependencies in the same repo yields any benefits for integration testing.
Ideally each module dependency should have its own set of tests which test its features based on the appropriate level of abstraction. Dependencies should be more 'general purpose' (suit more different use cases) while higher level logic should be more fitted to the specific business domain. Integration tests should not test the implementation of module dependencies; dependencies should have their own tests.
Higher level tests can sometimes help to uncover issues in dependencies and thus help you to design the tests of those dependencies but keeping them separate is essential because the dependencies should represent a completely different level of abstraction.
You don't want to end up tightly coupling the tests of the main project with the implementation details of its dependencies. Separating the tests correctly helps you to ensure that the scope of your tests is limited to the correct level of abstraction.
My point is that while it's desirable to integration-test a project with its dependencies plugged into it, those tests should not reference any specific implementation details of those dependencies... Because, otherwise, unrelated changes in the implementation of the dependencies are likely to break your higher level tests (which should not be the case); code changes within dependencies should only break your higher level tests if those changes affect higher level behavior.
For example, changing method names and arguments of a dependency should not break your top level integration tests (assuming you've made the matching code changes in your main project source, you shouldn't need to change the top level integration tests at all, they should still pass), the top level tests shouldn't care what the method names of dependencies are and they especially shouldn't care about how those methods are implemented.
The author also gives (IMO) 2 weak reasons against submodules.
(1) He says its hard to know which repo you're editing (main repo or submodule). I agree, but in practice this hasn't been an issue for me. A simple `git status` or `pwd` is usually enough to know which repo I'm editing.
(2) The author also says that committing changes with submodules can be confusing since it involves multiple commits: one commit in the submodule, and another in the main repo to update the commit it points to. I agree this is a little confusing at first and definitely tedious, but conceptually I think it is pretty simple.
That said, I do agree that submodules are confusing -- just for different reasons.
My main gripe with submodules are that they don't work well with the rest of git. Why isn't adding a submodule just a `git add` to a directory with a git repo inside of it? If there are new commits in a submodule, why doesn't `git checkout .` reset it back to the commit the main repo points to? If I clone a repo with submodules, why do I need to run additional submodule commands to get an exact copy of the codebase? Basically, to me it feels like submodules were slapped onto git as an afterthought and little care was taken to think the git CLI experience as a whole. I think submodules would be a lot less confusing if git had designed a better CLI for it.