Hacker News new | ask | show | jobs
by dakiol 238 days ago
I've worked in huge repos with hundreds of developers pushing code every day, dozens of MRs open per day, and all I always needed was a very limited set of what git is capable of (git commit, git co, git st, git merge/rebase, git log).

To find bugs, I use "bisect but visually" (I usually use jetbrains IDEs, so I just go to the git history, and do binary search in the commits, displaying all the files that were affected, and jumping easily to such versions).

Git conflicts are easily solvable as well with a gui (jetbrain IDEs) via the cli or via something like Sourcetree. Easily, the most used "feature" of git that i use is:

- for a given line of code, see all the files that were touched when that line was introduced

But I usually do that via the IDE (because to go through dozens of files via cli is a bit of a hassle for me)

So, what am I missing? I know jujutsu is much simple (and powerful) than git, but I only have used the "good parts" of git and it has never been a bottleneck... but ofc, you don't know what you don't know.

6 comments

The biggest for me: merge-conflict as first-class state within JJ.

I regularly have multiple commits being worked on at a time across different parts of the codebase. If I have to sync to head (or any rebase) and one of my side branches that I'm not actively working on hits a merge conflict, I don't have to deal with it in that moment and get distracted from my work at hand (ie: I don't need to context switch). This is a big productivity win for me.

If you want some other points, check out: https://fallthrough.transistor.fm/43#t=0h31m5s

Some points from the episode:

* With no separate index vs commit, (everything is just a commit), you don't need different commands and flags to deal with the different concepts, they are all just marked together. In JJ, if you want to stack/stage something, it's just a normal commit (no reason to have different concepts here).

* You don't have to name/commit a change at all. Every time you run any JJ command (like `jj log`, or `jj status`), it will snapshot the changes you have. This means that if you want to go work on some other branch, you don't have to go and commit your changes (they auto-commit, and you don't have to write a description immediately), then update to master branch and start working again.

* Or you can just `jj split` (https://jj-vcs.github.io/jj/latest/cli-reference/#jj-split), and split a working changeset into 2 separate commits.

This seems similar to the "Local History" feature in JetBrains IDEs.
When you start doing git surgery where there are commit chains that need to stay logical is where JJ starts to shine. If you are constantly editing previous commits and placing code in your working area into those previous commits and rebasing original/main.

I also really like that every change is automatically committed. It’s a great mental model once you get used to it.

Git rebase works fairly well and is somewhat uneventful, unless there are major changes happening. I do hate the experience when one file was remove in my feature branch, but main did a major refactor which affected the original file, so conflicts are a bit awkward then - but other than that, this seems like a fairly clean workflow.
Git rebase is an enormous pain in the ass.

Rebases must be done linearly. And right now! Oops, you made an error in an earlier stage of the rebase? Start over, good luck! Want to check something from earlier while you’re in the middle? Sorry, you’re in a modal state and you don’t get to use your regular git tooling.

You can just record all your changes with git commit --fixup and then do a non-interactive rebase that just applies all the changes.

You can use all the regular git tools in a rebase, in fact it would be quite useless without. You can also just jump to other branches or record a fix to a previous commit. It doesn't matter what you do in the meantime, it only cares what is the HEAD, when you call git rebase --continue, and then it only performs what commands you specify in the rebase todo. You can even change the todo list at any time.

Yes, it's certainly possible to do all those things with Git. Compared to jj, it's just much harder to do, easier to mess up, and harder to recover from if you do mess up.
I just gave you an example how it is not "much harder".
git has rerere for this usecase, jj doesn't - you have to find the conflict resolution manually in your history in this case if you made a mistake.
git has rere, but jj doesn't because its equivalent is built in. https://github.com/jj-vcs/jj/issues/175#issuecomment-1079831... is some discussion about the differences here.
I have never understood the claim that git is hard. the docs are good and there are plenty of examples online.

feels the same when people say, "jq is hard i use python instead" like ok

I think what people usually mean is "scary" or "it's easy to mess up". Git is very easy to use until you mess up, then it can become complicated, and certain actions may have devastating consequences.

Two examples from recent memory:

Someone merged the develop branch into their branch, then changed their mind and reverted the merge commit specifically (i.e. reversing all the incoming changes), then somehow merged all of this into the develop branch, undoing weeks of work without noticing. I had to go in and revert the revert to undo the mistake. Yes they messed up, but these things happen with enough people and time.

Another very interesting issue that happened to a less technical person on the team was that their git UI somehow opened the terminal in the wrong folder. They then tried to run some command which made git suggest to run 'git init', creating another git repo in that wrong location. Fast forward some days and we had an issue where people had to clean their repos, so I was in a call with the person helping them run the clean command. The UI opened the wrong location again, I helped them put in the command and it started cleaning. The problem was that this git repo was essentially at the top level on their disk, and since was a fresh repo every single file was considered a new file so it tried to delete EVERYTHING on their disk. This was of course mostly my fault for not running git status before the clean command, but this potential scenario was definitely not on my mind.

I mean. How can it be scary when you have git reflog.
The reflog doesn't capture everything. jj's oplog does.

An example of something that the reflog isn't going to capture is a git reset --hard losing your unstaged changes, whereas the equivalent flow and commands in jj would allow you to get those contents back.

The thing to keep in mind is that Git doesn't version the file system, it versions the index. This is because a file system guy like Torvalds knows that the file system is a shared resource and no program should think it can control its state. Therefore a Git repository doesn't consists out of all the files below a directory, it consists out of everything in the index.

Git does version everything that is in the repository and all these states occur in the reflog.

> The thing to keep in mind is that Git doesn't version the file system, it versions the index.

Yes. I think that this difference is what introduces a lot of friction, both in the model, and how people use it. The divergence between the files that exist on disk inside your working copy and what's actually tracked means lots of opportunities for friction that go away once you decide that it should. That doesn't mean things are perfect, for example, by default jj only snapshots the filesystem when you run a `jj` command, so you can still lose changes from in between those, you need to enable Watchman to get truly full logging here.

> all these states occur in the reflog.

Well, let's go back to the documentation for reflog:

> Reference logs, or "reflogs", record when the tips of branches and other references were updated in the local repository.

It only tracks changes to refs. That is, the states that refs have been in. So, one big example is detatched HEADs: any changes you make to those, which still are contents of the repository, are not tracked in the reflog.

Even for refs, there's differences: the reflog says "ref was in state x and changed to state y" without any more details. jj's oplog keeps track of not only the state change, but the reason why: "rebased commit <sha> with these args: jj rebase -r <sha> -d trunk"

The reflog only tracks individual refs. Say we rebase multiple commits. The reflog still just says "the head of this branch was in state x and changed to state y" but the oplog says "a rebase happened, it affected all of these commits refs in these ways," that is, it's just inherently more rich in what it tracks, and does it across all relative commits, not only the refs.

This doesn't mean the reflog is bad! It's just a very specific thing. Git could have an operation log too, it's just a different feature.

That makes sense
I am (was) a git expert. I’ve written a git implementation. I’ve used it since shortly after it was first announced.

Git has lots of sharp edges that can get hairy or at least tedious really rapidly. You have to keep a ton of random arcana in working memory at all times. And a bunch of really useful, lovely workflows are so much of a pain in the ass that you don’t even conceive of doing them.

I learned jj in one day and never went back.

^^^ This aspect of the arcana one is required to keep in working memory is an issue that's glossed over far too frequently. I understand that git is a developer focused tool, but requiring a user to keep a constant mental burden in working memory completely bars non-developers from using git in any legitimate way.

I'm not a welder or a metalworker, but I do know how to weld. I use a welder a handful of times per year when I need/want to. Welding is dangerous, and achieving excellence is a difficult and long road. But I can use the same tools as a pro and still get a few pieces of metal stuck together without having to relearn and restudy the whole system each time something goes wrong.

I haven't used jj in anger yet, but I think it might at least be approaching that style of developer tool.

So based on my experience teaching git ( I remember a cvs to git migration …) , reality tells me people find git difficult.

Now, once you teach them it’s a commit graph with names, some of them floating, some people get it.

The thing is, not everyone is comfortable with a commit graph, and most people are not - just like people get lists and arrays but graphs are different.

So I agree with you on principle ( it shouldn’t be difficult), but most people don’t have a graph as a mental model of anything, and I think that’s the biggest obstacle.

I have burned git into my brain, so it's no longer hard to me. OTOH, I only pull out jq once every six months or so, and I just barely scrape by every time.
and i honestly would rather parse json inside ipython and then move to a script, than keep invoking `| jq` time and time again.
From time to time, I end up in a state which I don't know how to recover from, and it's very frustrating to have to take an hour or two from my real work in order to try to figure out how to get out of that state.

https://roadrunnertwice.dreamwidth.org/596185.html

The reflog is the failsafe. It is the tool that fixes all the scary states, as it keeps a journal of the states of each ref in the repo (like branch heads).

You can see where you were and hard reset back, no matter what state you are in.

If you like the reflog, you'll love jj's oplog: it's like the reflog, but for all repository state changes.
I've worked with many folks over the years after learning myself...

The feeling of complexity comes from not yet understanding that commits are just sets of changes to files. They are then thrown off the scent by new terms like origin clone vs push and pull, merge vs rebase, HEAD increment notation vs other commit hashes.

Once people start with a local understanding of using git diff and git add -p they usually get the epiphany. Then git rebase -i and git reflog take them the rest of the way. Then add the distributed push and fetch/pull concepts.

Parsing json is so much easier with Python than jq, it's not even funny. That doesn't mean jq is useless, because sometimes keeping it in the shell is the best option. But in terms of ease of use jq is shit.
I think people who grasp the basic idea of a commit graph and approach it in terms of "this is how I want to manipulate the graph, what are the tools that will allow me to do this?" find it easy, and people who approach it in terms of building a cookbook of commands that comprise a workflow don't.
I am somebody who deeply cares about my commit graph. I want to maintain clean history, and I want to regularly amend previous commits (until merged) in order to tell a coherent story about development. I want to keep unrelated commits on separate branches, so they can be reviewed and merged independently.

I understand how to do these things, but git’s interaction model makes it tedious at best and hard at worst.

jj’s interaction model makes these things simple, straightforward, and obvious in the overwhelming majority of cases.

Even though I never found git hard, I find jj better.
My perspective, git isn't hard, but coordinating git workflows in teams with a merge backlog is a real pain in the ass.
I've long been facinated by how bimodal understanding of git is. I'm one of the lucky ones to whom it came naturally, but there's clearly a large population who finds git challenging even after investing significant time and effort into learning it.

I don't see this anywhere nearly as drastically with other tools.

> after investing significant time and effort into learning it.

And the significant time and effort amounts to a total of 15 seconds.

There are simply people who've rtfm and people who haven't
The git documentation is one of the nastiest docs ever just like the whole git ui. It’s technically entirely correct, but won’t help you understand how it works in any way.

It’s exactly like folks in 1995 telling you to rtfm when you’re trying to install Linux from a floppy disk. It’s doable, but annoying, and it’s not that easy.

That's really unexpected. To me, git documentation was one of the best cleanest official docs I've ever read.

Just in case, I'm talking about the Pro Git book [0]. I remember reading it on my kindle while commuting to office by train. It was so easy to understand, I didn't even need a computer to try things. And it covers everything from bare basics, to advanced topics that get you covered (or at least give you a good head start) if you decide to develop your own jujutsu or kurutu or whutuvur.

[0] https://git-scm.com/book/en/v2

This is exactly what I meant. https://git-scm.com/book/en/v2/Git-Branching-Branches-in-a-N...

The book says that ‘ To really understand the way Git does branching, we need to take a step back and examine how Git stores its data’ then it starts talking about trees and blobs.

At that point you’ve lost almost everyone. If you have a strong interest in vcs implementation then fine, otherwise it’s typically the kind of details you don’t want to hear about. Most people won’t read an entire book just to use a vcs, when what they actually want to hear is ‘this is a commit graph with pointers’.

I agree with you : the information is there. However I don’t think you can in good faith tell most people to rtfm this, and that was my point.

That explains it for some people, but there's something more here.

Hell, I've personally mentored people who struggled with git and I could feel their struggle.

I'm not saying that learning git was an insurmountable task for them, but their struggle was not something that I had to go through.

Quick, what does git pull foo do if foo is a branch vs a remote and how do you fix it if you messed up/which is preferred when both exist?
> if foo is a branch

It does `git pull <default remote> foo`.

> vs a remote

It gives an error, because you haven't specified the remote.

I don't know what behaviour you find intuitive here?

> how do you fix it if you messed up/which is preferred when both exist?

git pull is YOLO mode, so I never do it, but I would just reset the branches where I want them to be? You get a summary with the old and new commit hashes, so resetting is really easy.

I've never encountered this case in real life, so I don't know. Maybe I'll try later out of curiosity.

But whatever state I'm in, I'm sure I can reset back to where I were using reflog.

sometimes this is the fault of the manual and not the people
that's okay, it doesn't need to be your personal experience. you just need to understand that "git gud" is not a sustainable or intelligent mantra for tool design and selection
> So, what am I missing?

Here: git rebase is slightly broken in conflict handling. It can be made simpler to understand with jj.

What if we don't use git rebase at all? What does jj have to offer us?
Yes, I don't rebase, I only merge. We squash commits on merge of MR/PR anyway, so there is no value to rebase for us AFAICT. It also removes a ton of gnarly situations you can find yourself in when you mess up a rebase somehow.
But I'm not offering anything to you. Unless there's a budget, in which case I will make that svn fly in circles over your enterprise, dear sir.
one thing which causes problem with git for me is collaborative work without using "git server". This usually comes up at homelab situation with no access a "git server" or ssh server. One thing with jj is i can use existing sharing mechanism like dropbox, google drive or if nothing else just copying jj folder (granted all of those are bad idea w.r.t vcs but still).
I don’t understand this critique. You can copy a .git folder around just fine. You can expose a “server” by giving friends ssh keys that can only access the git stuff. In fact for a long time that’s how git “was done” at various corps.
> You can copy a .git folder around just fine.

You can do this, but due to file locking, you can corrupt the state if it's shared. jj is specifically designed so that it won't corrupt the repo in this way: https://jj-vcs.github.io/jj/latest/technical/concurrency/

Create a bare repo on the USB stick(/dropbox/Google Drive/random folder) and just push to the USB stick.
Historically speaking, this does not change things, at least for Dropbox/Google Drive. Stack Overflow is tons of posts like this: https://stackoverflow.com/questions/2199637/is-it-possible-t...

That said, I haven't tried this lately, maybe it's gotten more robust over time. But historically, even a bare repo on something like Dropbox has issues.

Sure, but this seams to be more of an issue with Dropbox, not with Git, when I run a database on Dropbox, the same problems occur. I wouldn't trust these to even preserve file attributes correctly, so I would put things into a tarball, before uploading (optionally also encrypting).
Git server is just a directory. It may or may not have actual content files in it (aka bare). In fact, any git clone of any repository is also a server on its own (and clients can have multiple "remote"s to pull from).
My go-to solution for this problem is a git init --bare --shared=group repository in a shared mountable drive. Then you can declare that repo origin, and tada, git push/pull works.
> git init --bare --shared=group

This is a very git command.

It does exactly what it says on the tin:

It calls "git" to "init"ialize a repository, which we don't need a working tree for ("bare") and that it's going to be "shared" with members of the "group".

Not to be a jerk, but 'hundreds of devs and dozens of MR per day' is not 'huge repos'. Certain functionality only becomes relevant at scale, and what is easy on a repo worth hundreds of megabytes doesn't work anymore once you have terabytes of source code to deal with.
> terabytes of source code

You sure that exists?

Git repositories that contain terabytes of source code?

I could imagine a repo that is terabytes but has binaries committed or similar... But source code?

Google's monorepo is in fact terabytes with no binaries. It does stretch the definition of source code though - a lot of that is configuration files (at worst, text protos) which are automatically generated.
Google had 86TB of sourcecode data in Piper way back in 2016.
Dang, that's mind boggling - especially if I keep in mind that a book series like lord of the rings is mere kilobytes if saved as plain text.

Having 86 TB of plain text/source code - I can't fathom the scale, honestly

Are you absolutely sure there aren't binaries in there (honestly asking, the scale is just insane from my perspective - even the largest book compilation like Anna's isn't approaching that number - if you strip out images ... And that's pretty much all books in circulation - with multiple versions per title)

Each snapshot of the repo isn't that big, but all the snapshots together, plus all the commit metadata and such, are
git could never, but piper at google is way over that figure. Way, way over.
Microsoft has actually done a lot of work to scale got to large repos
It's why there's special Microsoft Git VFS (a lot like VFS at google that is also referenced in the talk).

It was made to make working on Windows source code possible with Git.

Very sure, i work in one