I Hate Git Submodules | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	I Hate Git Submodules (abildskov.io)
	108 points by Addono 1913 days ago

26 comments

zffr 1913 days ago

Clickbait title. The author literally says "Spoiler alert. I do not hate submodules." and also later recommends submodules for language ecosystems that don't have built-in package managers.

The author also gives (IMO) 2 weak reasons against submodules.

(1) He says its hard to know which repo you're editing (main repo or submodule). I agree, but in practice this hasn't been an issue for me. A simple `git status` or `pwd` is usually enough to know which repo I'm editing.

(2) The author also says that committing changes with submodules can be confusing since it involves multiple commits: one commit in the submodule, and another in the main repo to update the commit it points to. I agree this is a little confusing at first and definitely tedious, but conceptually I think it is pretty simple.

That said, I do agree that submodules are confusing -- just for different reasons.

My main gripe with submodules are that they don't work well with the rest of git. Why isn't adding a submodule just a `git add` to a directory with a git repo inside of it? If there are new commits in a submodule, why doesn't `git checkout .` reset it back to the commit the main repo points to? If I clone a repo with submodules, why do I need to run additional submodule commands to get an exact copy of the codebase? Basically, to me it feels like submodules were slapped onto git as an afterthought and little care was taken to think the git CLI experience as a whole. I think submodules would be a lot less confusing if git had designed a better CLI for it.

fishywang 1913 days ago

>If there are new commits in a submodule, why doesn't `git checkout .` reset it back to the commit the main repo points to?

>If I clone a repo with submodules, why do I need to run additional submodule commands to get an exact copy of the codebase?

`git config [--global] submodule.recurse true` resolves both questions.

As to why is that not the default? I _believe_ there are security concerns, but I'm not 100% sure.

hobs 1913 days ago

Clickbait title, but still I do hate them for all the reasons you do state - something that should be somewhat transparent and frankly nice is instead a huge boring PITA.

For any project where I could choose not to use them, I choose not to use them.

erik_seaberg 1913 days ago

I've been the git fixer for a few different teams. I want to like submodules, but there's something that doesn't fit my brain the way the rest of git does. It feels half-baked. I think we're still missing the best way to model the problem as a tree of related states.

Pxtl 1913 days ago

They feel like an unfinished tool provided by a third-party, not a built-in component of git.

The simple fact that switching branches doesn't auto update the submodule is awful.

kasbah 1913 days ago

You can change that behavior now in newer versions of git:

    git config --global submodule.recurse true

https://stackoverflow.com/questions/1899792/why-is-git-submo...

colechristensen 1913 days ago

Agreed. I also want to like submodules but every encounter with them has been an awkward confusing mess.

They're fine if they are completely hidden from human interaction and used in a programmatic way, but the human interface with the concept seems to always be so awkward.

neolog 1913 days ago

Everything about git feels half-baked tbh

spicybright 1913 days ago

I agree with a half baked UI. It feels like they kept on tacking on more and more commands and flags to get to where we are today. But the fundamental data structures the UI operates on are rock solid though. I'm actually surprised more software doesn't use git's internals for things besides version control.

ChrisMarshallNY 1913 days ago

My understanding was that they were designed by someone that likes monorepos.

ajkjk 1913 days ago

This post is really hard to read. It mentions git submodules and then sharply veers into a bunch of seemingly barely-related paragraphs. If each section had one sentence at the top saying something about git submodules it would be a lot more coherent.

nooyurrsdey 1912 days ago

I felt the same way. It didn't feel coherent at all.

ajkjk 1912 days ago

I wondered if it was GPT-generated but asking that seemed rude.

snarfy 1913 days ago

You spend all this time figuring out how to make git do what you want using this general format:

    git <git-command> [args]

You would expect to do the same thing with a submodule, it would be:

    git submodule <git-command> [args]

But no, submodules have their own set of commands. It's like it's purposefully obtuse for no good reason.

magicalhippo 1913 days ago

> It's like it's purposefully obtuse for no good reason.

At least it's consistent with the rest of git then...

Git is one of those tools I just can't muster the willpower to truly learn. I use SourceTree and hope for the best, and search the web frantically when something weird happens.

I used Mercurial from only the command line for many years, never felt I needed a GUI. But git, there's just something about it.

ghayes 1913 days ago

My common advice is: bite the bullet and learn it. Specifically, delve deeper past the porcelain commands and understand what they do. Also, for a while I stopped using “git pull” and used “git fetch” and manually managed merges and branch tracking (a lot of “git reset —-hard origin/branch-name”). Once you get it, you will be a lot more productive. You’ll still have to deal with obtuse commands from time to time, but you’ll have a better model of what they should do and can check if they are doing it right.

rectang 1913 days ago

The Git command line interface is confusing, but Git's data structure design — a content addressable store — is relatively simple (and very interesting!).

I went through the book Pro Git three times leading three different training groups, and I was pretty comfortable with it. But what took me to the next level was prepping a talk for Papers We Love San Diego which explores everything that happens inside the `.git` directory when you initialize a repo and perform a couple basic commits: https://www.youtube.com/watch?v=fHSZz_Mx-Uo

magicalhippo 1912 days ago

I guess part of the issue. I just don't interact with git enough that it matters much.

I might spend half an hour every now and then trying to figure out right incantation for this specific issue, but overall SourceTree just tells git what it needs to hear and I can carry on with what I really want: code.

ghayes 1912 days ago

Sure, but I might argue that you'll spend _even less time_, in aggregate, with more productivity if you understand the tool better. This is particularly useful when, for example, you're good at saving your peers time by properly rebasing and/or cherry-picking your commits.

kelnos 1913 days ago

That doesn't really make sense to me. `git submodule` is for managing (adding, removing, updating) the submodule, not for doing any random git operation on the submodule.

snarfy 1912 days ago

Why is it

    git submodule update

and not

    git submodule pull

I understand pull in this context would be a submodule command, not a git command, but why use different terminology from git for what is the same idea?

rgovostes 1913 days ago

I routinely encounter problems where the file structure on disk gets out of sync with the .gitmodules file, or one of the internal files inside .git, and I need to completely recreate my local repo.

I dislike having to warn in the README file, "somewhere in this repo are submodules, and you need to use this incantation to clone them."

I think you can now have the submodule reference point to a branch but it can't point to a tag, which is what I want most of the time.

And there's how `git status` just reports "modified content" without going into details.

junon 1913 days ago

Moving submodules is indeed a PITA, but you don't have to recreate your whole repo.

The "correct" (albeit still annoying) way:

- git submodule deinit <path/to/submodule>

- rm -f <path/to/submodule>

- git submodule add ... new/path

In an emergency situation, you can almost always recover. I've not corrupted a repo in almost 10 years and I do some unspeakable things to them :)

To remove a module manually:

- git reset . (from root; NOT --hard)

- Remove the <path/to/submodule> from working directory.

- Remove entry from .gitmodules

- Remove entry from .git/config

- Remove (-rf) the folder .git/module/<path/to/submodule> directory (it follows the same structure as the working directory)

- git add -A .gitmodules <path/to/submodule> (Tab completion might not work but the command will)

This can be used to forcefully remove a submodule.

To remove all submodules without starting over (I've personally never needed this in the last X years):

- Remove all working directory paths for each submodule

- Remove .gitmodules

- Remove .git/modules/

- Remove any mention of submodules in .git/config

- git add -A

Usually the first manual one solves whatever problem you're facing.

EDIT: I have no idea how to format HN comments, sorry :| nothing I try works. Hope it's readable.

skyfaller 1913 days ago

What do people think of git subtrees?

https://codewinsarguments.co/2016/05/01/git-submodules-vs-gi...

Or the non-standard git-subrepo?

https://github.com/ingydotnet/git-subrepo

saulrh 1913 days ago

I've been having decent luck managing my dotfiles' dependencies with `git vendor`, which is a nice porcelain around subtrees. The big win is that it keeps track of your subtrees for you and presents a sufficiently-simple interface for adding, removing, and updating vendored dependencies: `git vendor add <name> <repo URI> <path>` to create, `git vendor update <name> <commit-spec>` to update, `git vendor list` to see what you have installed and where. I haven't yet had to do any dev work on my vendored deps, but I suspect it wouldn't be much more difficult than anything else I've done with it.

skyfaller 1913 days ago

That sounds promising, but are we talking about https://github.com/brettlangdon/git-vendor ? It looks like it's been abandoned since 2016 and none of the forks on GitHub look like they have much traction. This looks even less like something I'd want to base my workflow off of than git-subrepo.

saulrh 1913 days ago

Ugh, hadn't noticed the author went AWOL. Yeah, that's it. I've been using it without apparent issue at least that long. Now I'm not sure if I'd recommend using it, agreed. On the one hand, unmaintained software is always sketchy. On the other hand, it's actually only like two hundred lines of bash, so worst-case you just fork it and pretend you wrote a custom script to manage your subtrees like everyone else does? :/

ddevault 1913 days ago

I came here to recommend subtrees. They're pretty good.

skyfaller 1913 days ago

I guess my concern about subtrees is that they seem like they can result in unnecessary duplication / use of disk space. But the ability to work offline with a complete copy of the code seems worth the tradeoff in most cases, and git-lfs should help deal with large file sizes.

Have you run into any "slow push speeds" with subtrees, as the person complains about in the first article I linked?

ddevault 1913 days ago

No, I haven't run into any such issue, though I have only used it with small to medium sized repositories. The repo simply grows linearly with the size of the secondary repository. With a submodule, you would have to clone it just the same. In fact, you might enjoy some benefits from cross-compression by using subtrees that would not be available to submodules, and it's faster to reuse the same clone connection you were already getting the source from.

coryrc 1913 days ago

I tried subrepo and ran into some dumb error right away. I went with subtree and it worked well, but I needed to write a script for others to use it to update (but the system already had one, just didn't work well). I just can't say it is okay to not have a script like nobody has a script for git pull.

new_realist 1913 days ago

+1 for subrepo.

skyfaller 1913 days ago

My concern with subrepo is that I have no idea who is developing it and how much resources they have. I'd hate to learn to use a tool central to my development workflow and then have it discontinued.

Is git-subrepo good enough that it's worth the risk of using something not built into git (and the hassle of installing something extra)?

Kinrany 1913 days ago

For a minute I confused this with another external tool: git-filter-repo [0]. It's recommended by the official manual as replacement for git-filter-branch [1].

[0] https://github.com/newren/git-filter-repo/

[1] https://git-scm.com/docs/git-filter-branch

attah_ 1913 days ago

Terrible. You can't tell that they are there and no tool support in remembering the remote.

ChrisMarshallNY 1913 days ago

I use them (occasionally), but only when the pain of not using them would be greater, because they are a blue-assed bitch.

But they really do enforce a very strict version control. If we want to be absolutely sure of a version, Git submodules will give that to you.

But I only use them in one PHP backend project that I plan to barely ever change, because every change means that I have to crawl through a raft of repos, updating a submodule chain.

It's exactly the kind of operation that calls for a scripting solution, but it is also one of those projects, where I change it so seldom, that it isn't worth it to write a script.

For my frontend (Swift) work, I use SPM (Swift Package Manager). Much easier on my nerves.

timzaman 1913 days ago

I've been maintaining a git repo at my current and previous employer - both started out with submodules, and got axed due to added complexity and cognitive load. After removal, they were not missed. Case closed.

geuis 1913 days ago

I can only speak for myself. In the roughly 12 years I've been using git at multiple different companies and for my own projects, I have never used git submodules. I feel like I have a good, well rounded set of experiences too. Code bases big and small, massive monorepos, and many microrepos all separate. In none of them have we used submodules.

That being said, maybe submodules would have solved some problem here or there that we had. I'm open to arguments in favor of them in that case. But we've always been able to get the work done without them, so I don't personally think they're indispensable.

boring_twenties 1913 days ago

The biggest pain point for me has been that they're basically incompatible with `git worktree`. By default they'll just be cloned from upstream in the new worktree, thereby defeating the purpose of using worktrees in the first place.

I used to have a bunch of hacky scripts for working around this, but lately I've just been giving up and avoiding submodules as much as possible.

SavantIdiot 1913 days ago

I was hoping to see better solutions to this problem in the comments, but the paucity of a solid solution means the problem isn't solved yet. I agree with `qznc` below, in that once maturity is achieved, the monorepo should x-furcate and create packages, but up until that point there really is no clean solution for this relational concept.

qznc 1913 days ago

It believe it is a question of maturity. If the subcomponent is mature enough, then turning it into a package is fine. The cost is that patching the subcomponent and testing the main component takes more effort (though you can script it). With submodules at least the build and test cycle is as quick as if it is in the same repo.

So there is a scale from quicker iteration to less coupled: same repo, submodule, different package. The question is in which cases the middle step "submodule" is worth it or if you should rather switch to one of the others always.

weitzj 1913 days ago

Exactly what I am thinking. My use case is to have a shared api repository containing an IDL (here protobuf/grpc) and you hook it up as submodules for iOS, android, Golang. The round trip time is reduced compared to creating and publishing modules for each generated code. This makes it far easier to experiment with a new api feature so one gets a better feeling how this behaves in each language. And if the api is mature enough you might as well change your development flow into publishing modules instead of using git submodules.

Therefore I like submodules and would argue that people might underestimate the increased round trip time for publishing modules and then referencing them in code compared to using submodules

omarish 1913 days ago

Everybody hates git submodules.

audunw 1913 days ago

It might be worth looking at West, used by Zephyr (embedded RTOS): https://github.com/zephyrproject-rtos/west

Not sure how mature it is or if any other projects use it. But seems to be working well for Zephyr.

attah_ 1913 days ago

At least it is better than subtrees, which cannot tell you what they are even if you ask. Also, ClearCase isn't atomic beyond file level even... you use labels to make "versions".

allo37 1913 days ago

About the point of it not being obvious where you're working: I find using zsh or another shell that prints git information as part of the shell prompt helps immensely with this.

neolog 1913 days ago

I don't see the point of submodules. If you want to make a change in another repo, do that. If you want to reference a specific version of another package in your code, do that. When would I want submodules?

yellowapple 1913 days ago

I have a FOSS project[0] that uses submodules for two separate reasons:

In the first[1], it's to include the project's JS implementation in the project's website. Considering that this would be the only JS code running on the entire site, it seemed like overkill to throw in some newfangled asset manager like Bower just for a single NPM package.

In the second[2], it's to include the encoding/decoding test cases alongside the implementations. This way, instead of having to maintain a bunch of independent per-implementation unit tests, I can maintain all the tests in one place, and then have the per-implementation test suites snarf the test cases, and I then know with reasonable certainty that all my implementation libraries have equivalent behavior.

There are probably other, "better" ways to do both these things - I could bite the bullet and use Bower for the website, and I could have test suites download test cases on-the-fly - but submodules were the path of least resistance, and I've yet to encounter any significant downsides.

----

[0]: https://base32h.github.io

[1]: https://github.com/Base32H/base32h.github.io - specifically /assets/base32h/, which points to https://github.com/Base32H/base32h.js

[2]: https://github.com/Base32H/base32h.rb - specifically /spec/cases, which points to https://github.com/Base32H/base32h-tests

armada651 1913 days ago

> When would I want submodules?

When you have library that is useful to more than one project, but not popular enough to have its own package on multiple package managers. Then the easiest way to reference a specific version of the library is to submodule it.

suzzer99 1913 days ago

They are a pain but they come in handy for us.

The main thing I learned is if you mess up any part of your submodule during creation - do not try to fix it. Just delete it from the parent repo and start over.

Also do not bother deleting it using git commands. Delete it in the .gitmodules file, then search your .git folder for every reference of the repo you want to delete (including folders named after it) and delete everything.

Either that or start with a clean parent repo clone.

rc_mob 1913 days ago

i wish git would implement a built in subsplit feature

fiddlerwoaroof 1913 days ago

That’s basically what the subtree commands are, no?

yuppie_scum 1913 days ago

They are a nightmare

tarkin2 1913 days ago

git subtree

They came about because everyone hates submodules.

colesantiago 1913 days ago

then don't use them?

PaulBGD_ 1913 days ago

> Spoiler alert. I do not hate submodules. I do how ever have an instant oh oh response when people mention they want to solve a problem with Git submodules.

First line of the article explains the title.

colesantiago 1913 days ago

> Why I hate submodules.

> Spoiler alert. I do not hate submodules.

Clickbait much? Make up your mind?

nix23 1913 days ago

Me too, and that's why i work whenever possible with BitKeeper and nested repositories.

crb002 1913 days ago

Also hate them. Far better to have a dependency pulling shell script - no overhead of git yak shaving - just maintaining the URIs.

tadfisher 1913 days ago

If you can wrap your build in Nix, I highly recommend it. The upcoming "flakes" feature handles pinning Git-hosted dependencies and locking revisions, even if they aren't Nix flakes themselves.

rubyist5eva 1913 days ago

If you get to a point where you think you absoultely need git submodules...just switch to svn, so much pain in misery can be avoided if you just use the right tool for the right job and SVN handles the git submodule use-case effortlessly (for C++ at least).

For languages with proper package management (ruby, python, go, node, etc...) put in the extra effort to utilize your package manager to update your dependencies instead of bothering with submodules. If you're still set on doing submodules, I'm willing to be you're just "doing it wrong" (TM).

pfundstein 1913 days ago

Unfortunately "just switch to svn" is easier said than done and likely to cause more headaches than dealing with git submodules.

rubyist5eva 1913 days ago

Why?

Subversion is way easier to use than git, and it takes minutes to setup a server. Pushing your project history is just a few steps with git-svn, though it may take a while depending on the size of your project.

TortoiseSVN is probably the most straightforward and easy to use version control GUI there is.

pfundstein 1913 days ago

My point is that it's easier to manage and maintain a single version control system.

> Subversion is way easier to use than git

That's subjective, I'm more familiar with git thus it's easier for me.

> and it takes minutes to setup a server.

Git doesn't even require a server to use. You can create local repos, you can pull/push to remote repos via SSH/HTTPS/etc. No specialized server software needed.

> TortoiseSVN

TortoiseGIT. (Though I prefer headless)

paulryanrogers 1913 days ago

Are you suggesting Svn with externals? If so how is it better than Git and submodules?

(Having used both I'd say I dislike both equally.)

rubyist5eva 1913 days ago

I mean just having all your dependencies in a mono repo and vendoring the correct version your application depends on.

kelnos 1913 days ago

If that's all you want, you can absolutely use git as a monorepo, and you can check in vendored dependencies just as easily.

The main downside (and I admit it can be a big one) to a git monorepo (vs. svn) is that you can't check out a subtree all that easily.

(I do agree with the parent that svn externals is a lot more seamless than git submodules, though.)

cryptica 1913 days ago

I also hate monorepos. I consider the monorepo to be an anti-pattern.

It's an architectural advantage to separate each module into a different repo as it encourages careful separation of concerns.

If you find that you often need to update many modules together every time you want to add a new feature to your project, this is often an indication that your modules do not have proper separation of concerns and your abstractions are leaking. It means your project exhibits low cohesion and/or tight coupling between modules.

The difficulty in maintaining separate module dependencies is actually a very useful signal to you as a developer that your code is too tightly coupled and needs to be refactored into modules which are more independent.

Monorepos are a bandaid patch solution which covers up the root problem. The real problem is incorrect separation of concerns, AKA low cohesion which leads to tight coupling between your components.

It's not possible to design simple interfaces between components when these components have overlapping responsibilities.

linkdd 1913 days ago

Updating a backend API consumed by a frontend:

  - in 2 repos -> 2 PRs -> 2 test suites -> 2 code reviews
  - in a monorepo -> 1 PR -> 1 test suite -> 1 code review

When your project grows in complexity, there are some concerns that cross the boundaries of your repositories (CI/CD pipelines, testing and QA being a few examples).

Having a monorepo helps.

Consider having all your docker images and helm charts alongside the source code of the many parts of your big project. Is that really an anti-pattern?

EDIT: also a new dev arriving in the team, having to clone only one repository is easier for them. I also try to have a simple docker-compose stack so they have only one command to spin up the whole dev environment.

cryptica 1912 days ago

>> When your project grows in complexity, there are some concerns that cross the boundaries of your repositories

The notion of 'cross-cutting concerns' is also an anti-pattern. It's a violation of the 'separation of concerns' principle. A violation of the 'cross-cutting' kind, to be exact.

There are almost always better alternative solution which don't involve cross-cutting concerns but which require a slightly more carefully thought out architecture.

When it comes to testing, I agree that (for example) integration tests are extremely valuable but I disagree that having the source code of your dependencies in the same repo yields any benefits for integration testing.

Ideally each module dependency should have its own set of tests which test its features based on the appropriate level of abstraction. Dependencies should be more 'general purpose' (suit more different use cases) while higher level logic should be more fitted to the specific business domain. Integration tests should not test the implementation of module dependencies; dependencies should have their own tests.

Higher level tests can sometimes help to uncover issues in dependencies and thus help you to design the tests of those dependencies but keeping them separate is essential because the dependencies should represent a completely different level of abstraction.

You don't want to end up tightly coupling the tests of the main project with the implementation details of its dependencies. Separating the tests correctly helps you to ensure that the scope of your tests is limited to the correct level of abstraction.

My point is that while it's desirable to integration-test a project with its dependencies plugged into it, those tests should not reference any specific implementation details of those dependencies... Because, otherwise, unrelated changes in the implementation of the dependencies are likely to break your higher level tests (which should not be the case); code changes within dependencies should only break your higher level tests if those changes affect higher level behavior.

For example, changing method names and arguments of a dependency should not break your top level integration tests (assuming you've made the matching code changes in your main project source, you shouldn't need to change the top level integration tests at all, they should still pass), the top level tests shouldn't care what the method names of dependencies are and they especially shouldn't care about how those methods are implemented.