Hacker News new | ask | show | jobs
by brennen 2053 days ago
> I think git in general should copy the approach of Fossil and include issue management and wikis along with the repo, to keep things consistent and avoid vendor lock-in.

A few paragraphs I recently wrote elsewhere:

The entire state of code forges as a general thing in 2020 is all the evidence you could possibly want that version control systems (Git, I'm talking about Git) are themselves massively deficient in design.

I rant about this all the time, but there is an entire class of argument about how & whether to use GitHub / GitLab / Gitea / Phabricator / Gerrit / sourcehut / mailing lists / whatever that would mostly vanish if the underlying data model in the de facto standard was rich enough to support the actual work of software development. Because it's not, we find ourselves in a situation where no widely used DVCS is actually distributed in practice, and the tooling around version control is subject to platform monopolization by untrustworthy actors and competitive moats.

Code review should itself be distributed/federated, but few of the people involved have incentives to make that happen. It's possible something like https://github.com/forgefed/forgefed will eventually get traction, and Git has been dominant for long enough that I wonder all the time when we might see a viable successor that learns from its fundamental mistake. In the meantime we're forced to choose from a frankly pretty terrible lot of options in the broad structural sense.

(For clarity, I'm a WMF employee and am involved in the decision to migrate to GitLab.)

3 comments

I feel like the git model makes a lot of sense when viewed as an extension to the mailing list code review system. But most people dont want that model. However trying to fit git to other models is a bit round peg into slightly square hole imo.
Yeah, from that angle and from the perspective of 2005 it's a reasonable design, and I think what I describe above as a massive deficiency only really becomes visible in the light of everything that's happened since.
To me, it sounds like the issue is that you need a central source of truth that everyone can pull from for their purposes, and distributing the code review part doesn't sound like it'll add much. In the current climate, most anyone requesting code review is probably trying to merge into the main central source of truth anyways, so what actual benefit does it bring to either the maintainers or the contributors?
Version control for a genuinely long-lived project is a problem that often outlasts:

- Dominant version control and code review system(s) / paradigms.

- The current configuration of institutional owners.

- Users' trust in an owner / sponsor / maintainer. (Forks happen for reasons.)

- The involvement of developers who remember why and how decisions were made.

- The trustworthiness of the entities that control services, applications, and network real estate used for development.

Some central source of truth is usually necessary, but maintainers and contributors don't benefit when that source of truth is subject to vendor lock-in or can otherwise only migrate at great cost. For all the collaborative benefit that GitHub has undeniably wrought, platform monopolies are eventually a failure mode for end users, at least as for-profit enterprises. With the exception of the dominant silo vendors, nobody in the ecosystem really benefits from being forced to choose a silo that will be hard (and lossy) to escape later. The silos are engineered to limit mobility and channel interoperability to their own ends, for business reasons that run directly contrary to the interests of their users.

If the protocol at hand were actually up to the task, we'd spend less effort and anxiety on the problems of all the non-protocol platform tooling that's been built up around it.

> an entire class of argument [...] mostly vanish if the underlying data model in the de facto standard was rich enough to support the actual work of software development.

Interesting idea. You think we could develop a unified data model that covers source code, static files, documentation, project management and community management as a single unified thing?

That’s certainly ambitious, and I’d love to see it. For the moment it seem that Git has won for source code (in a pretty crowded field) because just that part was hard and it was a big improvement. The collaboration tools it includes, mostly around email, appear to be inadequate for most projects. So now we see a healthy ecosystem that adds rich collaboration on top of / next to Git.

> no widely used DVCS is actually distributed in practice

I think this is due to economic and social factors rather than technical ones. Fully distributing a Git repo is very doable, but harder to think about than the Github model. Plus you have all the normal P2P problems around who’s online and how good their connection is.

> tooling around version control is subject to platform monopolization

Again, I think this is simply the social network effect more than anything else. Making a website for your project let’s people find it, use it, and contribute to it. The bar to entry is lowered further if it’s a common platform, where people already have accounts and know how it works, and where they can get a consolidated view of all their activity.

Centralized hosting makes even more sense as projects grow and you only want a subset of the code on any given development machine. Eventually big monorepos preset serious scaling challenges.

Still... I completely agree that it would be awesome to have a more self-sovereign computing architecture writ large. I’m just pessimistic we can get there from here.

> You think we could develop a unified data model that covers source code, static files, documentation, project management and community management as a single unified thing?

Realistically, not exactly, given how much space some of those things cover.

I do think that entities like code review are as much a part of the history of a project as the deltas to code. Reviews not being first-class objects in the VCS itself has turned out to be a crack into which you can wedge an entire GitHub.

I won't claim I know where best to draw the line here. Better handling of large static files by default and a robust way to model relationships between projects obviously belong within the VCS. On the other hand, relationships modeled in issue tracking systems and the like are also part of the software's history, but past some level of complexity it gets much harder to imagine wedging them into something that you can pass around like you clone a Git repo. All I can really say for sure is that it feels broken that all of this stuff lives in competing application silos.

(As a sidebar: Not that you can't jam things like review data into git-as-data-store. Gerrit does just that. But nobody's going to mistake that for a usable interface to code review.)

Anyhow, I don't think you're wrong about the social & economic factors, but I think a different landscape with less concentration of power could have shaken out if (for example) easy code review had been baked in and host-agnostic early on. Fully p2p architectures aren't feasible, or even necessarily desirable, for a lot of problems - but it shouldn't be too much to ask that things are able to be federated and resistant to capture by a single vendor.

> Still... I completely agree that it would be awesome to have a more self-sovereign computing architecture writ large. I’m just pessimistic we can get there from here.

Yeah, fair enough. I am myself boundlessly pessimistic about the future of computing generally.