| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by SnowflakeOnIce 696 days ago

There seems to be no such thing as a "private fork" on GitHub in 2024 [1]:

> A fork is a new repository that shares code and visibility settings with the upstream repository. All forks of public repositories are public. You cannot change the visibility of a fork.

[1] https://docs.github.com/en/pull-requests/collaborating-with-...

4 comments

ff7c11 695 days ago

A fork of a private repo is private. When you make the original repo public, the fork is still a private repo, but the commits can now be accessed by hash.

CGamesPlay 695 days ago

According to the screenshot in the documentation, though, new commits made to the fork will not be accessible by hash. So private feature branches in forks may be accessible via the upstream that was changed to public, if those branches existed at the time the upstream's visibility changed, but new feature branches made after that time won't be accessible.

pcthrowaway 695 days ago

OK but say a company has a private, closed source internal tool, and they want to open-source some part of it. They fork it and start working on cleaning up the history to make it publishable.

After some changes which include deleting sensitive information and proprietary code, and squashing all the history to one commit, they change the repo to public.

According to this article, any commit on either repo which was made before the 2nd repo was made public, can still be accessed on the public repo.

reisse 695 days ago

> After some changes which include deleting sensitive information and proprietary code, and squashing all the history to one commit, they change the repo to public.

I know this might look like a valid approach on the first glance but... it is stupid for anyone who knows how git or GitHub API works? Remote (GitHub's) reflog is not GC'd immediately, you can try to get commit hashes from events history via API, and then try to get commits from reflog.

Perseids 695 days ago

> it is stupid for anyone who knows how git or GitHub API works?

You need to know how git works and GitHub's API. I would say I have a pretty good understanding about how (local) git works internally, but was deeply surprised about GitHub's brute-forceable short commit IDs and the existence of a public log of all reflog activity [1].

When the article said "You might think you’re protected by needing to know the commit hash. You’re not. The hash is discoverable. More on that later." I was not able to deduce what would come later. Meanwhile, data access by hash seemed like a non-issue to me – how would you compute the hash without having the data in the first place? Checking that a certain file exists in a private branch might be an information disclosure, but gi not usually problematic.

And in any case, GitHub has grown so far away from its roots as a simple git hoster that implicit expectations change as well. If I self-host my git repository, my mental model is very close to git internals. If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights. I mean, GitHub organizations are not a git concept.

[1] https://www.gharchive.org/

reisse 695 days ago

> You need to know how git works and GitHub's API.

No; just knowing how git works is enough to understand that force-pushing squashed commits or removing branches on remote will not necessarily remove the actual data on remote.

GitHub API (or just using the web UI) only makes these features more obvious. For example, you can find and check commit referenced in MR comments even if it was force-pushed away.

> was deeply surprised about GitHub's brute-forceable short commit IDs

Short commit IDs are not GitHub feature, they are git feature.

> If I use GitHub's web interface to click myself a repository with complex access rights, I assume they have concepts in place to thoroughly enforce these access rights.

Have you ever tried to make private GitHub repository public? There is a clear warning that code, logs and activity history will become public. Maybe they should include additional clause about forks there.

marcosdumay 695 days ago

Yes, even though I expect there to be people that do exactly what the GP describes, if you know git it has severe "do not do that!" vibes.

Do not squash your commits and make the repository public. Instead, make a new repository and add the code there.

sickblastoise 695 days ago

Why not just create a new public repo and copy all of the source code that you want to it?

cutemonster 695 days ago

Because they haven't read the article and this HN discussion?

"Why not just...". Once you already know something it can seem obvious.

sickblastoise 693 days ago

What?

Log_out_ 695 days ago

Chat gpt given the following repo, create a plausible perfect commit history to create this repository.

itsgabriel 695 days ago

Funnily enough the docs are wrong, the GitHub CLI allows changing a forks visibility https://stackoverflow.com/a/78094654/12846952

rkagerer 695 days ago

Am I the only one who finds this conceptually confusing?

rocqua 695 days ago

Nope, me too. The whole Repo network thing is not User facing at all. It is an internal thing at GitHub to allow easier pull requests between repo's. But it isn't a concept git knows, and it doesn't affect GitHub users at all except for this one weird thing.

brazzledazzle 695 days ago

I may be recalling incorrectly but I seem to remember it having some storage deduplication benefits on the backend.

tedmiston 691 days ago

> Nope, me too. The whole Repo network thing is not User facing at all.

There are some user-facing parts: You can find the fork network and some related bits under repo insights. (The UX is not great.)

https://github.com/apache/airflow/forks?include=active&page=...

Manuel_D 696 days ago

Not through the GitHub interface, no. But you can copy all files in a repository and create a new repository. IIRC there's a way to retain the history via this process as well.

JyB 696 days ago

That’s beside the point. The article is specifically about « GitHub forks » and their shortcomings. It’s unrelated to pushing to distinct repositories not magically ´linked’ by the GH « fork feature ».

mckn1ght 696 days ago

You can create a private repository on GitHub, clone it locally, add the repo being "forked" from as a separate git remote (I usually call this one "upstream" and my "fork", well, "fork"), fetch and pull from upstream, then push to fork.

shkkmo 696 days ago

All you should have to do is just clone the repo locally and then create a blank GitHub repository, set it as the/a remote and push to it.

make3 696 days ago

That's not the GitHub concept / almost trademark of "fork" anymore though, which is what your parent was talking about

a1o 696 days ago

I mean it's git, just git init, git remote add for origin and upstream, origin pointing to your private, git fetch upstream, git push to origin.