Hacker News new | ask | show | jobs
by SoftTalker 376 days ago
Cool discovery but the article could have been about 1/10 as long and still communicated effectively. At least they didn't post it as a video, so it was easy to skim to the important details.
5 comments

Interesting that multiple people are noticing this same thing. For me, this could have been:

"We found that a large portion of the 48 hours taken to backup our rails respository was due to a function in `git bundle create` that checks for duplicate references entered as command line arguments. The check contained a nested for loop ( O(N2) ), which we replaced with a map data structure in an upstream fix to Git. The patch was accepted, but we also backported fix without waiting for the next Git version. With this change, backup time dropped to 41 minutes."

It's the "impact" style of technical write-ups: sell the problem and the scale, then present the solution, which is thus presented and understood through the lens of business and customer success.

Generously, this writing style is supposed to show the business value of teams and individuals, for promotions or other recognition. But yeah, it can be frustrating to read this style.

Yes. I read the whole article thinking that this must have been generated by LLM, because at least the style remembers it.
Don't take my bullet points away from me
They came for the em dashes, and I did not speak up. Then they came for the bullet points, and I did not speak up..
Glad I wasn’t the only one who thought this. The post is also missing one obvious thing that I expect in any technical post: code snippets. Let me see the code.

ChatGPT has ruined bullet points for the rest of us…

No offense but writing this blog post couldn’t take more than a few minutes, why spoil it with LLM? Shoot, use one to check grammar and recommend edits even.

Exactly thought the same. Reading experience of the post would have been definitely improved, with less text.
That was also my thought.
Em dashes and bullet points!
For those that haven't read the article yet, scroll down to the flame graph and start reading unit it starts talking about back porting the fix. Then stop.
"How we decreased reading 'How we decreased GitLab repo backup times from 48 hours to 41 minutes' times from 4.8 minutes to 41 seconds"
it could have been longer. I still don't know why they were doing backup bundles with two refs :)
They weren't, if you look at the fix [1] the dedupe loop was run in all cases, not just those with known dupes, so the performance hit was any bundle with lots of refs.

1.https://github.com/git/git/commit/bb74c0abbc31da35be52999569...

But why couldn't they just dedupe the refs from the command line before starting the actual bundling - surely there are never more than a couple of hundred of those (one per branch)?
The point is the performance hit had nothing to do with dupe count (which could be zero), and everything to do with ref count.
Spot on. Some of our repositories at GitLab can contain millions of references.