Hacker News new | ask | show | jobs
by avalys 1835 days ago
Things have gotten a little better. But, try to do something off the beaten path in Git, and you may ultimately get the joke.

For example: “two weeks ago an intern accidentally committed a file containing IP we’re not allowed to use, we need to erase it from the repository and all developer machines.”

Have fun with that one!

EDIT: I mean, try to figure this out from the official Git documentation (https://git-scm.com/docs). No, Stack Overflow and Github are not the official Git documentation. Believe it or not, the idea that "Git is hard to use" predates Stack Overflow.

7 comments

I have had to do it once during the 12 years I have used git, so I seriously doubt that this is why people think git is hard. And I think that googling it would be fine in that case. That said: since I have done it once I could easily figure out how to do it again and it wasn't hard, just a bit cumbersome due to git's distributed nature.
Is Googling not allowed? This situation is pretty common, so there are plenty of SO answers and articles on how to accomplish rewriting history to erase it from the repo.

Removing from developer machines is a separate issue and requires you to be able to coordinate your Devs.

If you meant that it's not simple to work out from scratch what you should do without lots of reading and trial and error...that kinda goes for a lot of tools, no?

Yes but git seems to be one of those tools where laypeople seems to genuinely not be able to derive how to do complex tasks from first principles. Lord knows I can’t. If your Googling doesn’t turn up someone’s who’s had your exact problem you will have to burn a long time figuring out how to do what you want.
> For example: “two weeks ago an intern accidentally committed a file containing IP we’re not allowed to use, we need to erase it from the repository and all developer machines.”

Technically, the issue was actually pushing that commit to the remote repository.

I think the best advise one can give people when using it is to to run:

  git log -p origin/master..HEAD
and look at the commit messages and associated diffs to see if there's anything there that shouldn't be there before the actually run git push.
> git log -p origin/master..HEAD

See THIS is the problem. Ugly, inconsistent, clumsy use of the english language, and confusing.

This will go on my git sheet, with a comment as to what it actually does because I don't have the time to actually unpack that from first principles. I've got better things to do than become an expert on needlessly complicated software.

> See THIS is the problem. Ugly, inconsistent, clumsy use of the english language, and confusing.

It's a command line interface, not plain English. What's ugly and inconsistent about the git log command as was quoted in your reply?

> I've got better things to do than become an expert on needlessly complicated software.

As a software developer, I have to read through a lot of documentation to be able to use programming languages, SQL, data stores, unix utils, etc. I don't see why it would be any different for a VCS.

I think the actual issue is that people aren't willing to read through the documentation to understand what a command does and what options are available.

As for the command itself, the -p switch shows the diff associated with each commit shown with the git-log command. origin/master represents the upstream tracking branch of the master branch (most likely the base branch that the person is working on). .. represents a range operator and HEAD repesents the commit that's the latest commit on the branch on the local machine.

I use plenty of command line interfaces everyday. Most of them are pleasant, predictable, and easy to remember. None of them consistently confuse me like git does. (How many different things does 'checkout' do?)

SQL is not only much more intuitive than git, it gives me amazing leverage to deliver value to clients. By comparison, git wastes my time. There's zero or minimal competitive advantage to using it over any other VCS.

Erase from the repo, a little non-standard, but fine. Being asked to remove it from all developer machines sounds like someone misunderstood how version control works. Was that a real life example you hit?
They might have a model of version control in their head that predates distributed version control systems - I never used one myself, but the code base I work on still has scars here and there from the era when only one developer could have any single file checked out.
Not a misunderstanding, a requirement. If the developers cannot have that data (legal reasons? Secrets?) it must be deleted.

Probably has to be done outside git, though. Maybe one of the corporate virus scanners will let you definite a local signature.

It's rather simple: remove it from the origin repo using BFG Cleaner or whatever, then ask devs to delete and re-clone the repo. Not everything needs a complex technical solution.
Git clones the entire remote repository to each developer's machine. So, if you accidentally committed something you shouldn't have two weeks ago, everyone will now have a copy in their local repo. And you can't always just tell people to delete their local repos and start again, since they might have local branches they're working on, etc.
I don't think this is even possible with SVN or CVS, is it ?
At least with SVN, the is one option that is pretty similar to git’s filter-branch: svndumpfilter. You dump the entire history of the SVN repo to a file, edit it, and then load it into a new SVN repo. I used this technique to pre-process a repo to remove large files before migrating to Git. The file format is simple enough that you can easily write a program to edit the stream.
It’s very easy in CVS, which is why some people prefer CVS to any distributed solution.
Curious why the IP address has to be obliterated from history instead of just correcting it in a new commit? An IP does not seem sensitive like a secret or private key.

EDIT: Sorry, my bad. I misread.

Presumably they did not mean an Internet Protocol address, but instead data containing disallowed intellectual property.
I believe in this case, IP == Intellectual Property.