Hacker News new | ask | show | jobs
by floating-io 314 days ago
My fear of this sort of thing happening is why I don't use github or gitlab.com for primary hosting of my source code; only mirrors. I do primary source control in house, and keep backups on top of that.

It's also why nothing in my AWS account is "canonical storage". If I need, say, a database in AWS, it is live-mirrored to somewhere within my control, on hardware I own, even if that thing never sees any production traffic beyond the mirror itself. Plus backups.

That way, if this ever happens, I can recover fairly easily. The backups protect me from my own mistakes, and the local canonical copies and backups protect me from theirs.

Granted, it gets harder and more expensive with increasing scale, but it's a necessary expense if you care at all about business continuity issues. On a personal level, it's much cheaper though, especially these days.

6 comments

I once said to the CTO of the company I worked for "do we back up our source code"?

He said, "no, it's on github".

I said no more.

I would suggest your CTO needs some gentle reminders about risk management and how the cloud really works.

If your boss is that daft, its probably a sign to bail out. Remember to do your own personal due diligence. Due dill is not something that you just do for someone else: do your own! Do your own personal risk assessment. If code was lost, who would be found accountable? You or them?

EDIT: PS - I am a CTO ...

But it's github ...... it can't be lost.

Unless github closes the account, or a hacker gets access, or a rogue employee gets mad and deletes all, or some development accident results in repo deletion, or etc etc.

Or if the one employee who created the account and was paying for it on his personal credit card and then got laid off.

And no one else in the company knew what GitHub was.

Is there a story behind this oddly specific comment?
Not op but I saw this happen kinda maliciously...

Company was founded by "visionary" CEO that is an sculptor and doesn't understand tech.

Most crucial employee is young guy hired right out of college.

Company now has 50+ developers, but the critical software is still made by that guy several years later.

Critical guy still get recent graduate salary, notices he is actually the most important person in the company, and asks for a raise.

Company denies his raise. He quits. Company hire a bunch of developers from cheap third world countries to replace him...

Company learns: 1. All software guy made a bit before he asked for a raise was on his personal, not company github. 2. The software is in other programming languages, not the one the company uses normally. 3. Everything the guy wrote since he joined the company, is extremely difficult to maintain, guy is a genius and all his code is correct, clean and well made, but his thought patterns and how they end in the code are just too different, and all people that worked with him before also were geniuses and didn't care the code was "crazy".

Note: the "other geniuses" mentioned above, also quit when other companies made them great offer and the stingy employer refused them a raise too.

Happened where I used to work.

The guy writing scripts for a hardware test platform hosted them on a paid GitHub account.

System would pull the latest files off GitHub until the account ran out and the whole thing broke, months after he was gone.

Gotta add AI agents to that list
I don't understand how the VC financiers got so rich while being so stupid as to hire such stupid people at the executive level, e.g. your CTO.
If only 10 out of 100 of your investments make it, does it matter if one of the 90 failed because of lacking backups? Their risk strategy is diversification of investments. Not making each investment itself bulletproof.
Yes, it is in effect a gamble. The issue is that this strategy doesn't really prove profitable for the majority of VCs. Less than 30% of VCs get to a unicorn or IPO deal. 46% of VCs don't profit at all. This is as per the recent post "(Only) half of senior VCs make at least one successful deal.". I am even ignoring the ones who drop out and don't contribute to the active statistics.

The strategy is about as silly as having ten babies and expecting that one of them will make it. It is what you would expect out of the worst poverty-ridden parts of Africa.

An alternative is to select and nurture your investments really well, so the rate of success is much higher. I'd like to see the script be flipped, whereby 90% of investments go on to becoming profit making, secondarily with their stable cash income being preferred to big exits.

If nobody has the repo checked out, what are the odds it's important?
> If nobody has the repo checked out, what are the odds it's important?

Oh boy.

Tons of apps in maintenance mode run critical infrastructure and see few commits in a year.

And the people using it multiple times a year delete it afterwards?
relying on random local copies as a backup strategy is not a strategy.
They often only have a binary that you would have to reverse engineer. Source code gets lost.

To step outside just utility programs, the reason why Command & Conquer didn't have a remaster was:

> I'm not going to get into this conversation, but I feel this needs to be answered. During this project of getting the games on Steam, no source code from any legacy games showed up in the archives.

> And the people using it multiple times a year delete it afterwards?

The people wouldn't, but in the environments I'm thinking of, security policies might.

What you're leaning into is a high-risk backup strategy that would rely mostly on luck to get something remotely close to the current version back online. It's pretty reckless.

> The people wouldn't, but in the environments I'm thinking of, security policies might.

In environments that go so far (deleting local checkouts of code out of security concerns), I bet they do have a mirror/copy of the version controlled code.

More like “none of the people who worked on it are at the company any more”
Devs clean up their workstation sometimes. You can get fancy about deleting build artifacts or just remove the whole directory. Devs move to new machines sometimes and don't always transfer everything. Devs leave.

Software still runs, and if you don't have the source then you'll only have the binary or other build artifact.

Popularity != importance. There is plenty of absolutely critical FOSS code that receives very little maintenance and attention, yet is mission critical to society functioning efficiently. And the same happens in organizations too, with say their bootloader for firmware of hardware products.
You clearly haven't worked much with code over many years. When laptops change, not all existing projects get checked out.

In fact, in VSCode, one can use a project without cloning and checking it out at all.

Honestly I'm just really wondering what the odds are. In particular for code that made it onto git.
Over the long term, the odds reach 100% that it won't be checked out. That's because people mostly only work on newer projects. As for mature older projects, even if they're running in production, cease to see many/any updates, and so they don't get cloned on to newer laptops. This doesn't mean that the older projects are now less important, because if they ever need to be re-deployed to production, only the source code will allow it.
> In particular for code that made it onto git.

By “onto git”, do you mean “onto GitHub”? I really wish people would stop conflating the two.

In fact, I can't tell whether this confusion is just another symptom of, or a (major?) part of the reason for, why we're in the mess we're in.

I do not mean that, and I was not conflating anything.

Just making and committing to a repo at all is a step that implies a certain level of caretaking.

If the repo is on GitHub and two or more developers keep reasonably up-to-date checkouts on their local computers, the “3-2-1” principle of backups is satisfied.

Additionally to that, if any of those developers have a backup strategy for their local computer, those also count as a backup of that source code.

CTO explaining that to the CEO when your source code is completely gone:

CTO: "I know our entire github repo is deleted and all our source code is gone and we never took backups, but I'm hoping the developers might have it all on their machines."

CEO: "Hoping developers had it locally was your strategy for protecting all our source code?"

CTO: "It's a sound approach and ticks all the boxes."

CEO: "You're fired."

Board Directors to CEO: "You're fired."

Technically true, but only if we consider dev checkouts as "backups". In the majority of cases they probably are, but that's not guaranteed. The local repo copy could be in a wildly different state than the primary origin, a shallow clone, etc... While the odds of that mattering are very low, they're not zero. I personally prefer to have a dedicated mirror as a failsafe.
The benefit of DVCS. Losing the source code from github when it's all on local computers is the least of problems.
LMAO. Must be one of those MBA CTOs. At least mirror the crown jewels to bitbucket, Tarsnap, or somewhat else that has 2 weeks - 3 months worth of independent copies made daily.

If not MBA, the problem may also stem from the gradual atrophy and disrespect shown towards the sysadmin profession.

I mean it’s also on everybody’s laptop. Recovering from GitHub going away would be trivial for me
Exactly. Techofeudal overlords can switch off all "your" stuff at any time. Always have a personal and a business disaster recovery plan including isolated backups (not synchronized replication) on N >= 2 separate services/modalities.

Options to consider for various circumstances include:

- Different object storage clouds with different accounts (different names, emails, and payment methods), potentially geographically different too

- Tarsnap (while using AWS under the hood but someone else's account(s))

- MEGA

- Onsite warm and/or cold media

- Geographically separate colo DR site, despite the overly-proud trend of "we're 100% (on someone else's SPoF) cloud now"

- Offsite cold media (personal home and/or IronMountain)

How do you distinguish a mirror from not a mirror on GitHub?

I often have my git configured to push to multiple upstreams, this means that basically all of your mirrors can be primaries.

This is a really good part about GitHub. Every copy is effectively a mirror, too, and it's cryptographically verified as well, so, you don't have to worry about the mirror going rogue without anyone noticing.

I use GitLab locally and push only to that. GitLab itself is configured to mirror outbound to the public side of things.

In a collaborative scenario, doing it that way makes sure everything is always properly synchronized. Some individual's lacking config can't break things.

IMHO Lawyers get creative, a github account can show a ton of work activity, nda voilations, etc. Your "private repro" is just a phone call away from being a public repro.
Your whole GitHub account is a phone call away from being suspended due to frivolous IP/DMCA/what-have-you claims.
Personally, I'm concerned that my git repositories exist on my own host, the same host which has the SSH key to push to all the public mirrors.

I wish there were some service which would _pull_ my public git repositories, but not allow me to delete anything without a ~90day waiting period.

> Granted, it gets harder and more expensive with increasing scale, but it's a necessary expense if you care at all about business continuity issues. On a personal level, it's much cheaper though, especially these days.

I don't go as far as "live mirror", but I've been advocating _for years_ on here and in meatspace that this is the most important thing you can be doing.

You can rebuild your infrastructure. You cannot rebuild your user's data.

An extended outage is bad but in many cases not existential. In many cases customers will stick around. (I work with one client that was down over a month without a single cancellation because their line-of-business application was that valuable to their customers.)

Once you've lost your users' data, they have little incentive to stick around. There's no longer any stickiness as far as "I would have to migrate my data out" and... you've completely lost their trust as far as leaving any new data in your hands. You've completely destroyed all the effort they've invested in your product, and they're going to be hesitant to invest it again. (And that's assuming you're not dealing with something like people's money where losing track of who owns what may result in some existence-threatening lawsuits all on its own.)

The barrier to keeping a copy of your data "off site" is often fairly small. What would it take you right now to set up a scheduled job to dump a database and sync it into B2 or something?

Even if that's too logistically difficult (convincing auditors about the encryption used or anything else), what would it take to set up a separate AWS account under a different legal entity with a different payment method that just synced your snapshots and backups to it?

Unless you're working on software where people will die when it's offline, you should prioritize durability over availability. Backups of backups is more important than your N-teir Web-3 enterprise scalable architecture that allows deployment over 18*π AZs with zero-downtime failover.

See, as a case study, Unisuper's incident on GCP: https://www.unisuper.com.au/about-us/media-centre/2024/a-joi...