Cool discovery but the article could have been about 1/10 as long and still communicated effectively. At least they didn't post it as a video, so it was easy to skim to the important details.
Interesting that multiple people are noticing this same thing. For me, this could have been:
"We found that a large portion of the 48 hours taken to backup our rails respository was due to a function in `git bundle create` that checks for duplicate references entered as command line arguments. The check contained a nested for loop ( O(N2) ), which we replaced with a map data structure in an upstream fix to Git. The patch was accepted, but we also backported fix without waiting for the next Git version. With this change, backup time dropped to 41 minutes."
It's the "impact" style of technical write-ups: sell the problem and the scale, then present the solution, which is thus presented and understood through the lens of business and customer success.
Generously, this writing style is supposed to show the business value of teams and individuals, for promotions or other recognition. But yeah, it can be frustrating to read this style.
Glad I wasn’t the only one who thought this. The post is also missing one obvious thing that I expect in any technical post: code snippets. Let me see the code.
ChatGPT has ruined bullet points for the rest of us…
No offense but writing this blog post couldn’t take more than a few minutes, why spoil it with LLM? Shoot, use one to check grammar and recommend edits even.
For those that haven't read the article yet, scroll down to the flame graph and start reading unit it starts talking about back porting the fix. Then stop.
They weren't, if you look at the fix [1] the dedupe loop was run in all cases, not just those with known dupes, so the performance hit was any bundle with lots of refs.
But why couldn't they just dedupe the refs from the command line before starting the actual bundling - surely there are never more than a couple of hundred of those (one per branch)?
"We found that a large portion of the 48 hours taken to backup our rails respository was due to a function in `git bundle create` that checks for duplicate references entered as command line arguments. The check contained a nested for loop ( O(N2) ), which we replaced with a map data structure in an upstream fix to Git. The patch was accepted, but we also backported fix without waiting for the next Git version. With this change, backup time dropped to 41 minutes."