Hacker News new | ask | show | jobs
by tw4l 2170 days ago
As David Rosenthal (formerly of Sun, NVIDIA, and Stanford) explains, the actual Arctic Code Vault is a PR stunt, and has almost no chance of helping anyone in any kind of realistic disaster scenario: https://blog.dshr.org/2019/11/seeds-or-code.html

That said, the rest of the project, which focuses on preserving several independent copies of repositories hosted on GitHub with a handful of partner organizations, is quite useful. From the same post: "They are using a range of technologies, making feeds available over the Internet, and partnering with the Internet Archive, the Software Heritage Foundation and the Bodleian Library. These are mostly things which will get used in the foreseeable future, and should be applauded for that reason."

2 comments

>> They drag the 200 platters out into the 24hr sunshine, plug the solar panel into the Raspberry Pi, point its camera through a magnifying glass at the first frame, and let the QR app they happen to have on the Pi's micro--SD card do its thing. A couple of seconds later they have the first 2,900 bytes on the USB drive. It takes another couple of seconds to move to the next frame by hand. So they sit there for 383 days scanning a frame every 4 seconds to decode the entire archive. Except there's only sunshine enough for the Pi half the year, so it takes rather more than two years. Then they need to start the Pi building all that code...

>> Of course, this is ridiculous. No-one will decode this archive in the foreseeable future.

Yes, no one will be digging code out of Github right after the apocalypse. But what about 200 years after the apocalypse? Or maybe just 1,000 years from now, no apocalypse needed? I could see the archive being of immense historical value.

> "But what about 200 years after the apocalypse? Or maybe just 1,000 years from now, no apocalypse needed?"

-Thanks to flash memory cell charge leakage, I'd be surprised if the micro-SD card or USB drive kept its data for more than 3-5 years. They're designed for low cost, not longevity.

-The electrolytic caps will probably have dried out and failed by 50-100 years.

-The plasticizers used will have evaporated away by a century, leaving any plastic or rubber components brittle and crumbly.

-The lead free solders used in modern electronics are prone to the "tin whiskers" phenomenon. Not sure about the mitigations or timeframe for growth but a couple centuries is far, far longer than any reasonable design timeframe, making it a distinct possibility in my mind.

-At 1000 years, I'd wonder about diffusion effects in chips wrecking the circuits. It would be interesting to do a calculation to see how long that would take for an unpowered chip at room temperature.

Right, so it wouldn't be a 2020 computer. It would be whatever new computer they've built.
By then, why would they need code from GitHub? Given that they will won't even be able to run it in any shape or form.
To study history and culture. And who knows, there may well be algorithms we came up with but which no one ever re-discovered.
The Pi isn't going to work after 200 years. Its flash will be wiped. Never mind aging on all the other parts.
Presumptively, the Tech Tree will have some way of bootstrapping a system capable of decoding the tapes. They say in the introduction that it’s nearly useless to access the tapes without a computer and that they expect whoever is reading this is to have a computer that is centuries more advanced than we have now.

Maybe they just zip tied a ThinkPad to the tape reader and pray that it can eat whatever happens to it in the vault.

Archive Program director here - it's really not a PR stunt, we genuinely believe it will be of significant historical value and quite a good chance it will be of practical value.

Much of that is "if we forget technology which we realize somewhere down the road we actually might want to use again." History provides plenty of examples of this, and it's particularly important with a technology which mostly lives on ephemeral media that only lasts a few decades.

Even if you do expand your speculation to post-disaster scenarios, though, while it's true the archive wouldn't be an instant reset button, it would help greatly accelerate the recovery of technology. It's worth noting that it will come with a slew of (human-readable, not encoded) technical works regarding subjects ranging from modern software engineering to microprocessor design to photolithography to power systems, which we call the Tech Tree, along with a guide and index to all the stored repos. Wherever its inheritors / discoverers may be in terms of technological advancement, and especially if they have modern-ish hardware (which can last much, much longer than most storage media), recovering the archive's contents will be a lot faster than rediscovering them from scratch.

(Also worth noting we'll be storing "greatest hits" copies of the ~15,000 most-starred / most-relied-on repos, along with a sampling of several thousand repos with few/no stars, in a selection of places like Oxford's Bodleian Library; our hypothetical future tech seekers won't have to go all the way to Svalbard for those.)

I don't want to stress the doomsday scenarios too much, though, despite our ongoing pandemic. I think the most likely outcome by far is that progress will continue; the archive may be useful to recover a couple of otherwise forgotten technologies that suddenly become important / interesting; and it will ultimately be chiefly of interest to historians. That historical value is a key reason why it casts such a broad net. I too have a couple of fairly unsophisticated pet projects in there that the future won't be interested in individually - but collectively is another matter. One of the most interesting things our advisory committee told us is that history is replete with lists composed by wealthy people of the books they thought most important, carefully preserved for posterity, whereas what modern historians _really_ want is ordinary people's shopping lists, of which almost none survived. That's one reason there are millions of repos in the Arctic now, instead of eg just the most-starred 100K: some of those may be the modern technological equivalent of Renaissance shopping lists, for the historians who may take a particular interest in this (possibly) especially wacky and volatile era.

I know it's an inherently cinematic and dramatic project and so it's tempting to call it a PR stunt ... but I assure you, it's not, and, speaking personally, I would never have gotten involved with it if I thought it was.

People have some legitimate and some less legitimate criticisms here, in the HN comments section of course, but I for one think this is a fantastic effort and I'm pleasantly surprised to read what the new badge I saw on my profile yesterday is actually about.

There will always be "negative Nancies" -- especially here, they are everywhere -- but personally I'd just like to say thanks for having some vision outside of the normal day-to-day of making money for shareholders and keeping regular customers happy. More of this, please.

Did people with repositories know this was going to happen and did you give them a choice to opt out?
Rather more eloquently asked than by the other person I saw querying this[0]! I suspect it's covered under Github's TOS - specifically[1], only public repositories were included and these are all effectively just backups. Especially in the case of the vault in Svalbard. But you can opt out of the 'warm storage'[0].

[0] https://github.com/github/archive-program/issues/36 [1] https://docs.github.com/en/github/site-policy/github-terms-o...

I recognize they wouldn't have done it unless they felt confident of having the legal right, but it's just bad manners not to ask first.

If that's the case, this not-a-PR stunt degraded my impression of them.

I'm quite certain this isn't what their customers contemplated when reading "backup" in their ToS.

EDIT: Interestingly it says "This license does not grant GitHub the right to sell Your Content or otherwise distribute or use it outside of our provision of the Service.

It also says "You still have control over your content".

Is a subarctic vauly really within the ordinary course of providing the service? Did content owners have an opportunity to exert any control?

Most probably think it's neat, but GitHub would be naive to imagine everyone would consent.

Also what happens if it turns out one of those repos had personal information in it and the subject makes a GDPR right-to-forget demand? Are they going to drag it out and purge that bit of tape?

>Also what happens if it turns out one of those repos had personal information in it and the subject makes a GDPR right-to-forget demand? Are they going to drag it out and purge that bit of tape?

I believe GDPR has exemptions for archives ([0] section 28) so that's less of a concern for them I imagine. I recognise what you're saying, but I think anyone _very_ opposed would have a difficult time in court arguing GitHub should remove their work/name/etc. My (very loose) understanding of the law is that they would have to demonstrate some kind of loss. That being said, GitHub could just have sent a notification email with very little effort. Maybe 'no harm, no foul' applies here?

[0] https://www.legislation.gov.uk/ukpga/2018/12/schedule/2/part...

Hi Jon, Congratulations on moving forward with this. Thank you! If you ever think about what might come next in terms of being able to re-make computers and so on from scratch, here is a concept website I put up around 1999 (when I was trying to get NASA to support the work for space settlements). I still work on the general idea on-and-off in my spare time (generally at a more abstract level of software for sensemaking and organizing information) but so many other distractions get in the way: https://www.kurtz-fernhout.com/oscomak/goals.htm

From there: "The OSCOMAK project is an attempt to create a core of communities more in control of their technological destiny and its social implications. No single design for a community or technology will please everyone, or even many people. Nor would a single design be likely to survive. So this project endeavors to gather information and to develop tools and processes that all fit together conceptually like Tinkertoys or Legos. The result will be a library of possibilities that individuals in a community can use to achieve any degree of self-sufficiency and self-replication within any size community, from one person to a billion people. Within every community people will interact with these possibilities by using them and extending them to design a community economy and physical layout that suits their needs and ideas. As the internet has grown, it has enabled collaborative work which has created many success stories, including Linux, Python, GCC, Squeak and other projects. We want to harness that power and apply it to organizing technological knowledge in concert with many interested individuals. The main project goal is to develop an on-line library of technology ideas, techniques, and tools, including a range from high-tech processes like plastics to medium-tech like ceramic houses to low-tech like spinning wheels. Also included will be biotechnology processes, like perennial agriculture, companion planting, sheep farming, and eventually cloning and DNA synthesis. One process to be included is a way to convert the high-tech computerized library to a low-tech paper one as desired. Key to the whole endeavor will be to present everything in a how-to fashion. Also needed is a way to map out and simulate the interrelations of processes; for instance, sheep raising requires veterinarians, antibiotics, feed, fencing, and shears; shears require a blacksmith, metal, and a furnace. This latter feature also would be used to keep track of the product flows into, out of, and within a community's entire economy."

> Also worth noting we'll be storing "greatest hits" copies of the ~15,000 most-starred / most-relied-on repos, along with a sampling of several thousand repos with few/no stars

Making all of this code essentially useless. You'd need to store those repos and their entire dependency tree.