Hacker News new | ask | show | jobs
by eatonphil 1802 days ago
Really dumb question while I'm trying to decide whether to use something like git vs. CRDTs to handle version control for user changes made in an app I'm working on: why do we even use git anymore for source code version control if we want behavior like this?

Nobody likes merge conflicts. We all want versioning. So long as we have versions at all why isn't the ideal interface for developing in teams something more like editing in Google docs? Why aren't we just doing that? Why are we still using systems that produce merge conflicts?

Hoping for insight from folks who have either done this themselves or looked into it.

Edit: one particularly nice feature of Google docs over Confluence is that in Google docs I can suggest changes that is somewhat akin to branches with git. I don't need to force my change through without review. This isn't a natural part of CRDTs, but it sounds like the nicest source control system might be CRDTs plus branches?

8 comments

Feels like this should be a post in itself! This is just my opinion so open to any push backs / comments.

IMO the main reasons for source control are: - having revisions / diffs that can be reviewed by someone else - while working independently in a distributed fashion - easy way to rollback to an older revision

I'm not too familiar with CRDTs, will add more comments after reading more about it, but let's take Google docs as example here. What if we were all writing code in Google docs, how would that turn out.

Google docs do a decent job of versioning, so the last point can still be satisfied. I think with a source code styled version of Google docs, one can easily tag versions and do rollbacks. The issue happens when collaborating. Just from my experience working in spreadsheets where > 10 users are actively trying to modify, overwriting each other becomes very common. You can argue that it might make it easy to also fix this, but what about running the code locally?

One would have to still take a snapshot - hope that it actually compiles. From my personal experience, while I'm writing code, it rarely compiles in the first attempt. So then we would really have to work on snapshots where we know it compiles. At that point, you are really creating "commits" and pushing it to the remote. I am not sure if we can really completely avoid conflicts in such model.

In real time collaboration, like Google docs, merge conflicts can be avoided because you see them play out live, and can react to them.

In an offline collaboration system merge conflicts are a feature, they bring attention to inconsistencies you might otherwise overlook.

This has very little to do with how the data is stored.

It’s a good question. Well, CRDTs are just an underlying data type that guarantee conflict free eventual consistency. That doesn’t mean the result would be an English paragraph or buildable code, however. Then what git does really well that bare CRDTs don’t are things like branching and local experimentation. Being intentional with commit points can also keep file sizes down, for a more real-time tool like Docs potentially every character change would be stored. CRDTs can have performance problems with things like file size growth.

For, your use case of user changes, CRDTs seem like a good option. It would be interesting to explore building a version control system on top of CRDTs, if done right it could have the benefits of git with less merge conflicts.

Fork => Modify => Merge.

Suppose you have a 100 page gdoc. You start editing one paragraph per page, and voila, everything is fine. Casual edits are consistent "in context".

However, if you start editing on page 1, then skip to page 100 and go back to page 50, then it's possible that while "there was no conflict" (people weren't editing the same section of the doc at the same time), however, the document was in a "semantically inconsistent state" while you were making your edits (ie: someone trying to "compile" your doc in between the beginning and conclusion of your changes would have had a logic error until you'd made that final edit).

Git (and branching, as you're suggesting) allows you to select the next "actual" future from many "plausible" futures.

Think of A.C.I.D. You want your changes to be atomic (gdoc edits are not atomic across multiple "pages"), you want your changes to be durable (gdocs may/may not hit your definition for durability if changes can be made to the lines arbitrarily), consistency and isolation are also compromised with the free-flowing editing style you're describing.

Imagine dumping a python script into gdocs, pulling down the contents, and running it through a local python interpreter.

You could get pretty far with low amounts of edits and using the "comment" feature liberally, but once you try and "propose a refactor" you immediately need to "fork the universe" and have two (or more) semantically distinct representations of the document, or some way of queuing "multi-page edits" such that each could come in ... some sort of "merge queue" .... ;-)

For user changes, a solution based on OT/CRDT may be appropriate.

For code, you need merge conflicts. Let me put it like this: Do concurrent edits in google docs always produce a valid and correct paragraph, with no grammatical errors, and that communicates the proper thoughts?

If we need merge conflicts that must be manually resolved, how can this product being discussed exist?

If merge conflicts can be automatically resolved, why aren't we using a system that does operates like that under the hood rather than needing a product like this bolted on?

The product being discussed here doesn't solve the issue of merge conflicts itself, but rather of eventual consistency. Merge conflicts is only a part of it, and there are some heuristics that can be used to reduce those, but eventually require manual review. Perhaps with Github Copilot this can improve?

An example of the problem we are describing is explained here: https://blog.mergequeue.com/managing-github-merges-for-high-...

There are ways to resolve merge conflicts automatically in some cases given some assumptions, but there is no way to resolve any given merge conflict without a system that can create a correct program. I won't bore you with the formal proof for this (especially since I have no idea how to construct such a proof). Consider that some merge conflicts require new code to be written to resolve them.

For example, here's an original program:

    if(foo()) {
      send_money_to_oftenwrong()
    }
and here's a change we would like to merge:

    if(bar()) {
      send_money_to_oftenwrong()
    }
and here's a concurrent change we would like to merge:

    if(baz()) {
      send_money_to_oftenwrong()
    }
How do you produce the correct program from these conflicting changes?

---

Also for example, an unsophisticated and unsound method for automatically resolving some merge conflicts:

1. Assume that if the build and tests succeed, the program is correct. (This is the most important bit)

2. Accept all non-conflicting changes.

3. For conflicting changes, accept one side of the conflict.

4. Build and test the result. If it succeeds, merge it.

5. Otherwise, attempt the same with the other side of the conflict. If it succeeds, merge it.

6. Otherwise, it cannot be merged automatically by this method.

Note that this approach could also apply to a CRDT-based merge.

I'm on a medium sized team so this might not be possible for everyone. The way I resolve this in my Trunk-Based source control system(so we heavily use Feature Flags) is to just never merge.

Every person who is to check-in code is to first stash their changes, pull all code, then pop the stash and deal with any conflicts locally, do the relevant testing, then check-in. The entire repo is checkin 1 -> checkin 2 -> checkin 3 -> etc..

When viewed in this light it is no different than if you had pulled the baz() code before you started editing anything. Would you have changed it to bar() or not?

On the contrary, I for one am amazed how often no intervention is necessary on automatic merges, given how finnicky compilers are in general! If the teams are not ridiculously large and people mostly work on their own stuff and coordinate work on the "common" part, isn't it mostly good?

There are certainly usability issues with git but I wouldn't literally use Google docs with colored squiggly lines for the changes from Anne and Bob...

I don't mean literally Google docs but a storage system more like that than git. :)
> why isn't the ideal interface for developing in teams something more like editing in Google docs

Engineering and debugging are hard enough as it is. Doing this while one or more people are changing the code in real time would be a nightmare.

Because you'd never have a working revision to release, since everyone on the team will not finish their features at the same time.