| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by tekacs 22 days ago
	Maybe I'm missing something really obvious, but... 3,800 repos? I guess I find it kind of surprising they have that many!

16 comments

PAndreew 22 days ago

As others have said it's just a fraction. I'm in a medium size tech-related company and we have 7500+ in one Github org. We have two orgs, so altogether easily 10K+. Of course most of it is stale, obsolete, sandbox, personal tools, etc. I wouldn't be surprised if Github would have 100K+ internal repos or even more.

htrp 22 days ago

no pruning of repos?

sbarre 22 days ago

No OP but I used to work at a large company with a similar number of repos.

When I left about a year ago, we had just started (after being on Github for almost 8 years) an ongoing project of first archiving old/outdated repos in place, and then moving them to an "archived" sub-org, and waiting to see if anyone complained.

Previously no one wanted to outright delete or remove repos because of the risk that someone somewhere was relying on it, and also there was no actual downside to just leaving them there (no cost savings, no imminent danger other than clutter, etc), so resources were never allocated to do it. There was always something more important to work on.

In an org with a higher floor of engineering management, a proactive program for removing unused or outdated repos would absolutely be expected though I think.

a_t48 22 days ago

This is a continual fight for me. At nearly every company I've had to compromise on using a graveyard repo for packages within a monorepo, even though git has the whole history already.

Sander_Marechal 21 days ago

The problem with history is that you need to know when to look. If you're looking for some old code that you know existed but you don't know exactly what it was, you can't just browse to go and find it.

a_t48 21 days ago

Sure, but beyond a certain point the code that's there isn't just drop in compatible.

NewJazz 22 days ago

Gitlab is so nice for this. You can group repos together so it is harder to lose track of stale projects.

fn-mote 22 days ago

Breaks old stuff

philipp-gayret 22 days ago

I worked for a food retail store once. I remember going in the first day wondering, how hard can it really be... From the outside, it looks like they have a simple website. The website to order things on was an amalgamation of 300+ repo's. GitHub lost less in this breach. It takes a lot of effort to keep things simple as you grow.

robotnikman 22 days ago

Can confirm as someone working in the same field, we have a ton of repos

ashishb 22 days ago

Uber had 8000 repos at one point with 2000 engineers - https://highscalability.com/lessons-learned-from-scaling-ube...

Gigachad 22 days ago

Probably most of them are forks of some public repo with some patch applied and half of those are probably not even used internally anymore.

ashishb 22 days ago

Afaik, they eventually cleaned it up.

And it was each team owning multiple internal repos of their own deployments/libraries, and not, primarily, clones of public repos.

ryanhecht 22 days ago

Something cool that I've always liked about working at GitHub is how much of the company _runs on GitHub_ -- A lot of teams, even non-technical teams, have their own repos just to organize docs/SOP's/designs/etc like a traditional knowledge work company might use a Sharepoint

tempay 22 days ago

Personally I have over a hundred, especially from quick prototypes, studies or instances of templates so I can easily see how over 18 years and many hundreds of employees you end up with thousands.

MrDarcy 22 days ago

3800 is low for an org like GitHub. Glad it’s highly likely not all their repos are compromised.

organsnyder 22 days ago

Given the attack vector, it's possible that the impacted repos were ones that see more activity.

jamesfinlayson 22 days ago

I remember working at a company with at least 5,000 repos across five or six GitHub orgs, plus more stuff in Perforce.

Probably some old experiments in there but the company had its fingers in a few pies and some departments didn't mind creating yet another service to solve a problem.

I definitely archived the old stuff in my department (we had eight repos and that felt like enough for three people).

dgellow 22 days ago

I was part of an org with more than 15k repos

clutch_coder99 21 days ago

Damn that's alot. I'm wondering how many engineers handled all that?

dgellow 21 days ago

I cannot share much details, but one thing: livegrep had no issues handling that many repos! That project is such a blessing

https://github.com/livegrep/livegrep

dirkc 21 days ago

That was my first instinct, but thinking about it just a little it doesn't seem crazy, esp for GitHub.

How many folders do you have on your computer with some bits of code? It's probably not a terrible practice to add those folders to GitHub.

Across a big engineering company that can easily add up to way more than 3,800!

newsoftheday 22 days ago

It sounds low to me, I worked at a Fortune high number a few years ago and they had more.

skissane 22 days ago

In my personal experience, give it a decade or two, and any corporation will accumulate hundreds (or even thousands) of abandoned internal repos containing discontinued services, POCs/prototypes that never went anywhere, etc – people forget to archive them, or aren't sure whether something is still in use or not so err on the safe side.

AI is making this even worse. With coding agents, anyone can throw together a quick internal prototype of any idea they have, even if it has no hope of ever making it to production.

unix4ever 22 days ago

Maybe though AI will make it better, assign agents to monitor, maintain and keep repos up to date or via A2A refer them to an agent to dispose of them in accordance with company requirements. I actually think AI will greatly help this type of problem.

skissane 22 days ago

Autoarchiving repos which nobody has used in X years doesn’t require any AI - you can just write a bot to do it. People don’t, because it isn’t a priority. AI can make writing such a bot a bit easier, but can’t help much with getting approval from the powers that be to run it.

wazHFsRy 21 days ago

Even their sales teams work with GitHub repos, so not that surprising I’d say.

paulddraper 22 days ago

They have 800 engineers. So 3,800 repos is high, but not crazy.

Some of those could be forks.

eddythompson80 22 days ago

really? I mean these are internal repos. Probably most of them are random one-off experiments or a place to park code. Google has 2,900 "public" repos on github. Microsoft has ~8k "public" on github too. Can't even imagine how many they have on their internal systems.

noelsusman 22 days ago

Am I missing the joke here... they have hundreds of millions of repos.

dijit 22 days ago

I think they mean that these are internal github-org repos.

The ones used for running the site itself.

Though, its so many that i think there are some customer ones in there too.

nightpool 22 days ago

No, there's no joke, you might have just misread the article (the 3,800 number is the number of internal GitHub repos the employee had downloaded on their personal computer / had access to on their own GitHub account)

Galanwe 22 days ago

The breach is about internal repositories, not user repositories.

sunshine-o 21 days ago

Because everything in Github is designed for growth: Easy to create a repo, very hard to delete it (a lot of scrolling, clicking, copy/pasting the full name of the repo, etc.) I mean "Deleting", not "Archiving".

MS and Github need their number to go up, not having people cleaning up their repos to avoid any loose ends.

I have hundreds of them, it took me a few hour to delete the unused ones. In a medium size org with thousands of them, it will take weeks for security to do a cleanup.