| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by reisse 697 days ago
	> After some changes which include deleting sensitive information and proprietary code, and squashing all the history to one commit, they change the repo to public. I know this might look like a valid approach on the first glance but... it is stupid for anyone who knows how git or GitHub API works? Remote (GitHub's) reflog is not GC'd immediately, you can try to get commit hashes from events history via API, and then try to get commits from reflog.

2 comments

Perseids 697 days ago

> it is stupid for anyone who knows how git or GitHub API works?

You need to know how git works and GitHub's API. I would say I have a pretty good understanding about how (local) git works internally, but was deeply surprised about GitHub's brute-forceable short commit IDs and the existence of a public log of all reflog activity [1].

When the article said "You might think you’re protected by needing to know the commit hash. You’re not. The hash is discoverable. More on that later." I was not able to deduce what would come later. Meanwhile, data access by hash seemed like a non-issue to me – how would you compute the hash without having the data in the first place? Checking that a certain file exists in a private branch might be an information disclosure, but gi not usually problematic.

And in any case, GitHub has grown so far away from its roots as a simple git hoster that implicit expectations change as well. If I self-host my git repository, my mental model is very close to git internals. If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights. I mean, GitHub organizations are not a git concept.

[1] https://www.gharchive.org/

link

reisse 697 days ago

> You need to know how git works and GitHub's API.

No; just knowing how git works is enough to understand that force-pushing squashed commits or removing branches on remote will not necessarily remove the actual data on remote.

GitHub API (or just using the web UI) only makes these features more obvious. For example, you can find and check commit referenced in MR comments even if it was force-pushed away.

> was deeply surprised about GitHub's brute-forceable short commit IDs

Short commit IDs are not GitHub feature, they are git feature.

> If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights.

Have you ever tried to make private GitHub repository public? There is a clear warning that code, logs and activity history will become public. Maybe they should include additional clause about forks there.

link

pcthrowaway 697 days ago

Dereferenced commits which haven't yet been garbage collected in a remote yet are not available to your local clones via git... I suppose there could be some obscure way to pull them from the remote if you know the hash (though I'm not actually sure), but either way (via web interface or CLI) you'd have to know the hash.

And it's completely reasonable to assume no one external to the org when it was private would have those hashes.

It sounds like github's antipattern here is retaining a log of all events which may leak these hashes, and is really not an assumption I'd expect a git user to make.

link

Dylan16807 697 days ago

> Short commit IDs are not GitHub feature, they are git feature.

They're a local feature sure. But you already have a list of local commits, just open the .git directory.

Can you connect to a vanilla git server and enumerate every single hash?

> Maybe they should include additional clause about forks there.

It would help but they need much more than a clause about forks.

Ideally they would purge that extra data when making something public.

link

pcthrowaway 697 days ago

> Can you connect to a vanilla git server and enumerate every single hash?

If you have ssh access yes, but I don't think you can do this with just git (and of course github doesn't provide ssh access to the git repo servers)

The public distribution of commit hashes via their event log seems really irresponsible on github's part to me.

link

marcosdumay 697 days ago

Yes, even though I expect there to be people that do exactly what the GP describes, if you know git it has severe "do not do that!" vibes.

Do not squash your commits and make the repository public. Instead, make a new repository and add the code there.

link