Hacker News new | ask | show | jobs
by lol768 847 days ago
Why does GitHub provide no way for a repository administrator to self-service a git gc? I seem to recall reading a blog post that suggested GitHub had invested a bunch of engineering resource in making cleaning up unreachable objects much more scalable.
2 comments

I haven’t reached out for internally (and I’m not on a related team), the following is my own understanding.

The blog post was most likely this one: https://github.blog/2022-09-13-scaling-gits-garbage-collecti...

And I think it answers the product vision for it well (why it’s automatic):

> We have used this idea at GitHub with great success, and now treat garbage collection as a hands-off process from start to finish.

GitHub also provides these docs for what to do if there is sensitive data in your repo, which is quite involved and (given the huge amount of knowledge internally of both GitHub internals and git internals), I would trust their advice:

https://docs.github.com/en/authentication/keeping-your-accou...

You can also contact support or create/join a community discussion: https://github.com/orgs/community/discussions

If you feel strongly that a feature you need is missing, by adding your voice, you increase visibility of the request. I think GitHub does offer solutions to this problem though, including eventual GC automatically.

That's the actual insane problem.

I noticed long ago that unreferenced commits survive on GitHub for long, but I couldn't find a way to discover them.

I know that GitHub stores together the objects of many repositories, but they should have implemented and offered a way to gc them when they came up with that optimization.

Sure, there would still be the chance that someone already obtained the objects by the time you gc them, but it's a much lesser risk then leaving them there indefinitely (and they could provide a log of the last fetches to better assess the impact of the erroneous push).

> chance that someone already obtained the objects by the time you gc them

I was under the impression that there are various 'mirror github' projects that listen to the GitHub change event API and immediately crawl some/all commits.

If so, this isn't a chance - it is certain.

Ok, there's also the problem of accesses through the web interface, but it probably wouldn't take much to provide a short-lived log of them as well