Hacker News new | ask | show | jobs
by drewcrawford 4814 days ago
I am not a git maintainer, but as someone interested in improving submodules I can try to summarize the thread.

Submodules are difficult to use in practice for a wide variety of reasons. There are serious, complex proposals that have made it into git-contrib to build a "better" submodule, but for various reasons these have produced systems that merely make the tradeoffs in a different way that some people prefer.

This is not like any of those proposals. His problem is that "git add" "git diff", etc., don't "understand" submodules. It would be as if ls, cd etc. don't "follow" symlinks, so that you had to navigate to the correct directory yourself before you can use standard unix tools.

This is a serious problem, but his solution is essentially "we should use hardlinks instead of symlinks". That is, he wants to take the code that understands submodules out of the individual tools, and pop them in the filesystem somewhere where they are "shared" among more of the tools and don't have to exist in any of them.

There are many objections to this proposal. The chief one seems to be that this does not seem to directly address any particular problem. I think Ramkumar perceives that the reason git add/diff/rm don't support submodules is as a metaproblem "it is too hard to add submodule support to arbitrary tool". Whereas the git maintainers are saying "It is possible to add submodule support to arbitrary tool." So that's the initial standoff.

Another problem is that this requires a filesystem change, and that is essentially the most stable part of git that breaks incompatibility with other versions. If you read Linus's rants, you know that he generally applies an enormous amount of scrutiny to breaking compatibility. And so from his desk, you would need not just one clear benefit, but an overwhelming number of them, to break the contract like this.

But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation. To return to the POSIX analogy: we have both symlinks and hardlinks, and which one is better depends on what you are doing, there is no "one true link". If you replace all the symlinks with hardlinks, I think you will run into trouble with the hardlinks too.

Finally, it is unfortunate that the flamewar is about the monolithic patch rather than about some of the principles that led to the patch. I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool. These strike me as a concrete improvement over the existing system, and I wish that the energy that leads to huge unusable patches like this could be redirected into usable ones.

1 comments

    The chief one seems to be that this does not seem to directly address any particular problem.
Except that you later say:

     I think Ramkumar has had (at least) two very good insights: that "git add" and friends should understand submodules a lot better than they do, and also that they should have this understanding by way of consuming some API that understands them rather than incorporating separate code for submodules into every tool.
This is exactly the problem this solution solves. Instead of having a weird configuration file in the working tree for something that should be an integral part of the repository, there will be a generic system for adding links. With this generic system in place it is much easier to implement "git add" and friends support for submodules.

He repeatedly makes this clear but no one reacts to this point.

    But what I suspect is the True Rejection here is that this will pan out like all the proposals before it: to be different, but not strictly better, than the current implementation.
Implementing code in a different but not strictly better way that allows you to more easily understand and extend your library is called refactoring. This 'True Rejection' is essentially rejecting the merit of refactoring code.

I also don't think that the hardlinks/symlinks analogy holds very well. Hardlinks and symlinks are both features in their own rights. Having submodules be defined as a weird file instead of as a part of your repositories objects is a superficial change, he also states this. Everything the current submodules do could be achieved using the proposed solution. (As he repeatedly has to make clear to Linus and Junio)

There are a complicated set of problems that are preventing us from understanding each other. I am going to do my best.

> weird configuration file

One of the disputes here is that the maintainers are of the opinion that config files are actually good, on the face of them. They point to examples of well-settled uses like .gitignore to claim that config files are The Git Way.

It may very well be that configuration files are in fact weird, or are weird in this particular case, but since the convention is and has been for git's history that config-files-are-good it would require a well-reasoned essay to move the needle of discourse on this subject, not just to use "they are weird" as a claim to prove something else.

> This 'True Rejection' is essentially rejecting the merit of refactoring code.

I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring working code, for some definitions of "refactor", for some definitions of "working", and this has been the subject of many popular essays, most notably Spolsky et al. This is another place where moving the needle of discourse would require writing a well-reasoned essay that quotes the appropriate authorities, and it is not sufficient just to appeal to a particular view of the merits of refactoring as a claim to prove something else.

> Hardlinks and symlinks are both features in their own rights.. [this] is a superficial change.

This is another one of those thorny semantic problems that are preventing us from understanding each other. There is a sense in which it is superficial, and another sense in which it is a substantial change. If you are using "git add", or are implementing it, it is a superficial change. If you are writing subtree-merge or git-submodule or something that really needs to understand the storage of submodules, it is substantial.

And so they are both features in their own right, in the sense that: git-add-and-friends will want to access things with a certain pattern, and git-submodule-and-friends will want to access things in a very different pattern. This is why I suspect the solution here is to have two distinct APIs, that access the same underlying storage mechanism. And if it makes sense to continue to support something very much like the old API, it probably does not make sense to redesign the FS to look like the new API.

Of course, there is a lot of resistance in the git community to have two ways to do the same thing. So when I say "I suspect the solution is to have two APIs" I mean only that it would address most of the objections raised thus far, not that it would actually be implemented in mainline.

> Everything the current submodules do could be achieved using the proposed solution. (As he repeatedly has to make clear to Linus and Junio)

And as Linus and Junio have repeatedly made clear, merely doing everything the current implementation does is not within a few galaxies of meeting the burden for breaking FS compatibility. The compatability-break burden is extremely high.

> I am going to do my best.

Great :)

> One of the disputes here is that the maintainers are of the opinion that config files are actually good, on the face of them. They point to examples of well-settled uses like .gitignore to claim that config files are The Git Way.

Yes but .gitignore only configures your git client, the gitsubmodules say something about the repository instead. If that was the git way, wouldn't branch names be in a .gitbranches as well?

> I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring

I might be an extremist on this topic, so it's good to just leave it be.

> This is why I suspect the solution here is to have two distinct APIs, that access the same underlying storage mechanism.

I agree, but I think Ram. is correct in asserting that both ways could be achieved by having a link object with some configuration in it. (it could just be the .gitmodules file moved to the .git directory for all the end users care)

> The compatability-break burden is extremely high.

I understand, and it should not be taken lightly. But no one was suggesting this feature would be added to the master and shipped in the next release of git. It could even be delayed until there is another compatibility breaking change. Ram. never pretended his current work would be the final way of doing it.

Thank you for elaborating your understanding of the discussion :)

This is one of the nicest disagreements I have ever had. If we don't already, we should compare notes and find something to work on together, because when two people can disagree but still understand each other, that is where you make progress on complex problems. :-)

> Yes but .gitignore only configures your git client, the gitsubmodules say something about the repository instead.

This feature is often used to configure the repository, and I in fact use it that way. By way of example, https://github.com/new operates under the assumption that you use .gitignore to configure a repository. Perhaps it is best to say that config files offer flexibility in this dimension, whereas a link file is more rigid.

> It could even be delayed until there is another compatibility breaking change.

I believe that perhaps the discussion on the point of backwards incompatibility has been framed in a way that is nonproductive. Of course, once one has decided on a course of action, it is proper to consider how to reduce the impact of that decision. I agree with you that there are a wide variety of harm reduction strategies available here.

But these inquiries only become relevant once one has decided that the patch is in general an improvement in some dimensions. As an outside observer, I do not see an improvement.

I can see the logic that if it is true that git-add-and-friends have omitted support for submodules on the basis that such support is difficult, this patch could solve that problem. But I have not been convinced of the premise; there is no citation of the people who maintain the UI tools making claims of difficulty. Furthermore, Junio seems to argue at least that add's behavior is by design, I do not know enough about it to know if that is a sensible design, but it does suggest to me that the problem with UI tooling is not a function of implementation difficulty, but there is perhaps some design or ideological reason for the behavior of these tools that explains the state of them today.

The other problem that I have is as follows: if I accept the premise that the trouble with git-add is a matter of implementation difficulty, it seems to me that the trouble can be resolved at some other tool layer rather than in the FS proper. So if the hypothesis underlying the patch is correct, it seems to me that one should adopt the implementation that doesn't break compatibility over the implementation that does.

It is unfortunate that the matter of backwards compatibility was raised early and vociferously in the thread, because as you have pointed out there is a lot that can be done about backwards compatibility that doesn't address the real merits of whether the idea is good or bad. (Although I can understand why compatibility would be at the top of any maintainer's mind.) Perhaps this exchange between Junio and Ram. is an example of two people being far enough along their own lines of inquiry that they are having trouble making any sense of one another.

> I don't want to get into a big meta-meta flamewar here, but there are many people who do reject the merits of refactoring working code, for some definitions of "refactor", for some definitions of "working", and this has been the subject of many popular essays, most notably Spolsky et al.

Spolsky wrote against rewriting your software from scratch [1], but I couldn't find anything against refactoring, which are 2 very different things.

[1] http://www.joelonsoftware.com/articles/fog0000000069.html

A 'refactoring' is a change that doesn't change behavior, so the word is a red herring in this context, shedding more heat than light. Redesigns can be valuable, but let's call a spade a spade.