| On 1.: As others already noted, Git has a GC-mechanism, which means that objects can still linger around in any copy of your repo for a while. And if you need to version binary files, you'd better use git-lfs or git-annex. Obviously, if you don't need them, just nuking them outright with BFG or `git filter-branch` is fine, too. If you'd like to try git-lfs: It also includes tooling to retroactively migrate your Repository[1], but that'll re-write your history (although BFG obviously does that, too). If GCing your repo will not reduce its size, you'll probably have to hunt down any remnant branch and/or tag that might reference the old history and thus "keep it alive". On 2.: I'm not entirely certain I understand you correctly. Which commits were duplicated? The resulting merge commits? Assuming that, if you wanted to, you could probably build some shell script to get rid of them (or use something like `git checkout prod; git rebase -ir <first_commit>` and remove the duplicated merges yourself). But from a repo perspective, this shouldn't cause too much trouble (i.e. the additional space required will be negligible), and doing so would, again, mean that you'll rewrite history, potentially causing issues for others who still have local copies referencing your old commits. Also: If you try to go the rebase route: Make sure you understand the log Git will create for you. Using `-r`, it will preserve merges, but how they are represented is not very intuitive and you'll have to wrap your head around that first. You could also try to achieve the same result with `git filter-branch` and `--commit-filter`. In this case, you'd probably want to write a script that only performs the `git commit-tree` command if the tree ID passed to the filter is not the same tree as the first parent commit was referencing already (this should weed out all commits that don't change anything). [1]: https://github.com/git-lfs/git-lfs/blob/master/docs/man/git-... Edit: As an idea for 2.: Describing issues in commit histories as prose text is tricky :). If your problem looks different than I assumed, you could try to create a bogus Git repo that showcases the pattern in its history and put it on GitHub as a reference. You can create "empty" commits for that via `git commit --allow-empty`. |
If you've GCd your repo and you're sure there are no references to old commits laying around (also keep in mind remote branches and so on!), this might help you discover large objects that are still in your "new" history:
If run in bash, this should print the 20 largest objects in your repo and their size (in bytes):
If you find some blob that is too large, you could then search for its name like this: