Hacker News new | ask | show | jobs
by VonGuard 2379 days ago
You're right, let's burn the library down because one book has a liable chapter in it.

This argument is so horrible as to be actively harmful to Archive's work. Jason Scott is a god, and if we didn't have him, we'd have to invent him.

WE DO NOT GET TO CHOOSE WHAT THE FUTURE FINDS INTERESTING.

We live in the only point in human history where we can actually save all of humanity's knowledge and culture, and we can do so without having to worry about physical space or staff to work the "library." It's a remarkable time we live in, and yet, 99% of our society either doesn't care, thinks this work is stupid, or actively works against it through horrific copyright laws.

We know more about how Rembrandt painted and lived than we do about how Atari 2600 programmers worked and lived. I can go to Rembrandt's house and see where he lived, where he painted, how he worked, where he slept and ate and mixed his paints and taught his classes.

Atari's old HQ is just another office building. The source code to those games is mostly gone (thankfully, it's assembly and easier to disassemble). We need to save our culture and digital heritage, else we forget where we come from.

Deleting some old tweets is one thing, but actively worrying about Archive's work is just harmful to us all. We need 10,000 more Archives, dammit. It's supremely important work that is helping stem the tide of lost culture due to stock market forces. Geocities is gone forever because Yahoo! didn't find it profitable. This cannot keep happening.

9 comments

I’m not convinced it’s dangerous to explore whether there are benefits to ephemerality.

I’m also not sure your Rembrandt example shows what you suggest it does. The average Atari 2600 programmer would be more equivalent to the hundreds of now unknown artists in Rembrandt’s time. The John Carmack’s of today will be remembered in detail with or without blanket archive efforts.

Maybe, just maybe, Rembrandt’s Status in our minds is a result of generations of people each seeing the individual value in his work. That is, each generation does indeed get to decide what future generations remember. Or at least it used to be true until the digital age.

Maybe the change is an improvement. But maybe not.

And libraries are the epitome of what you’re fighting against. They are by definition works chosen by humans based on judgment calls of their perceived value.

Let’s at least acknowledge that blanket archive efforts are a fundamental change in themselves and a departure from the human status quo for thousands of years. Then let’s debate whether the change is an unabated good.

While I don't endorse your parent's over-the-top rhetoric, and I do agree that there is value in ephemerality and that it's worth noting that libraries are more carefully curated than a dump-and-archive, I think it's also worth noting that these are generally public pages.

All the stuff Tumblr users intentionally wrote and published publicly, but none of their IP address logs and other incidentally collected information, is exactly what ought to be archived and preserved, in my opinion. This is in strong contrast to incidentally collected data including clear PII like IP addresses that many companies today are hoarding forever, when they ought to be ephemeral.

Tumblr blogs often include people's names, faces, and/or details about their personal lives. That's very much personally identifiable information! And while they did post it publicly, they likely didn't do so with the intention of it being saved forever in a publicly available, easily searchable archive. This especially applies for porn blogs where people post their own original content.

There's certainly value in archiving social media but I think it has to be balanced against the harms, instead of defending the practice with literal religious fervor and dismissing all criticism out of hand.

Was there something in particular I said that you felt was defending it with "literal religious fervor and dismissing all criticism out of hand", or were you referring to my grandparent? I don't think I dismissed anything out of hand, I specifically acknowledged both the value of ephemerality and the point that traditional libraries are curated.

I agree that there is a danger that people may not realize how public and permanent the things they published to Tumblr were, or how dangerous it can be to do so (and I downvoted a sibling comment dismissing this danger). However, I think you and I have different threat models.

In my mind, archiving PII that is intentionally published is not particularly harmful because most lay people do, in fact, understand that their avatar, username, and by default, posts are public on Tumblr. They have had the opportunity to remove that information this whole time, and they still do, Archive.org removes stuff if you ask them.

By contrast, lay people have no mental model for what kind of information is incidentally collected nor how dangerous or benign it is. Certainly, lay people also can and do misjudge how public and how dangerous the things they intentionally publish are, but the gap is far, far less than incidental information. "Would you tell a stranger this" or "would you write this on a bathroom wall" are decent heuristics: the only difference in danger between text written on a bathroom wall and written on Tumblr is due solely to the potentially wider reach and possibility of even going viral on Tumblr. (Photos, of course, can also subtly compromise privacy in ways surprising to a lay person, but the gap is still much smaller than incidental information.)

In my threat model, that gap in understanding is much, much more dangerous than the intrinsic danger of PII. That's why I think that as long as Archive.org has a usable removal process, I think pretty much all the danger is in surveillance capitalism's collection of incidental information, not Archive.org's permanent record of intentionally publicized information.

The reason we fight against censorship (which is what this debate comes down to) with literal religious fervor is because that's how the other side fights for it.

Don't want it archived forever? Don't put it on the Internet. Seems simple enough.

If Archive.org had your attitude, I would actively oppose it. Removing private, personal info is not censorship. And nothing about "just don't put it on the Internet" is simple. What if someone hacked your devices and then put it on the Internet for lolz? What if you shared it in confidence with someone you trusted, who is intentionally putting it on the Internet to hurt you? What if you accidentally pasted the wrong thing or uploaded the wrong file? What if you were a child and didn't understand the dangers?

There obviously should be ways to ameliorate your mistake, which is why it is absolutely critical that Archive.org has a removal process.

Many people writing personal diaries/letters probably didn't do so with the intention of it being saved forever in a publicly available, easily searchable archive.

Yet such data is invaluable to historians and can give us a window in time through the eyes of people who lived that time. Having that publicly available data lost for all time would be an immense loss to future generations.

I'm sure in a few generations, some historians will study those archived porn blogs and get an insight on the evolution of humans' relations to sexuality that today's historians can only dream of.

Ironically, IP addresses are probably the _least_ personally identifiable bit of information in a lot of that stuff. Most people's IPs are assigned to someone else within months, or even hours. But a username, profile picture, etc? Those are potentially identifiable.
In a reply to your sibling I explain how in my view, the fact that lay people have no mental model of what kind of information can be incidentally collected and how dangerous it is, whereas lay people are much more capable of understanding the dangers of a personally identifiable username, profile pic, and personal details revealed in posts, makes the former far more dangerous than the latter.
> The John Carmack’s of today will be remembered in detail with or without blanket archive efforts.

Sure, but this leaves us a distorted view of history, where we have lots of details on the lives of "great men" and next to none on how ordinary people lived. Which means the vast majority of people who lived and died in that period end up written out of their own history.

Archaeologists spend a lot of time rooting around in ancient rubbish piles and cesspools, because these are some of the very few places where physical evidence of how ordinary people lived has survived. Nobody in ancient times would have nominated those sites as culturally important or worthy of preservation. But what we know of how ordinary people in those times worked, played, ate and drank comes largely from things dug up from them.

I certainly am sympathetic to the preservationist mindset. OTOH, even if we restrict ourselves to content that is natively created in digital form, the amount of "stuff" that comes into existence every day--much of it not on the public web or public social media--is staggering. (And much is not public for good reasons.)

I'm not convinced that we should feel a compulsion to save all of that. Just because it's more practical to be a pack rat about digital content doesn't mean that, taken to extremes, it doesn't still seem like being a pack rat.

In 2008 I found a parcel of bare EPROMs at a flea market container 27 games. 1 of those games was Cabbage Patch Kids Adventures in the Park, and it was spread across 12 chips, each one showing a progressive state of development across 9 months.

To my mind, this was the only known find of a vintage Atari 2600 game and its iterative development process. So, 30 years later, the only reason we had this snapshot is because someone found these chips and sold them at the flea.

The current state of digital preservation is abhorrent. Those roms would have taken up less than 1/4 of a 5.25" floppy, but the company behind them never thought to preserve that information or data.

Take2 Interactive republished BioShock in 2012. They couldn't find their source code. They didn't save it. They had to go machine to machine looking for it. The reissued game is not the same as the original.

As a society, we don't place any value on this stuff, but the potential value of it cannot be understood until the future has occurred. Letting it vanish is a disservice to the future. In the past, if a book was published, it wasn't going to vanish if the publisher went out of business, there would simply be no new copies.

In our digital online age, things vanish in seconds, days and hours. This is also a very different state of affairs. In the past we could not save everything, but everything didn't have a clock counting down from the end of the quarter over its head, counting the seconds until it is deleted.

The Library of Congress tries to save everything. Yes, libraries weed the stacks and choose items to host. This is due to space concerns: they can't host everything ever. Digitally, they can, and many host reams of microfilm and old newspapers because they can.

Libraries can, thanks to tech, now host every book ever, digitally, for very low costs. Copyright prevents that.

This is an unabated good. Leaving things behind and forgetting them is how you get Tulsa Oaklahoma, or the Armenian Genocide denials. We don't get to choose what the future finds interesting, and for the first time in history, we do not have to. Why in the every loving fuck would you worry about that?

Most likely, only for personal reasons. This is a humanity level problem. Your personal worries are irrelevant in 100 years when everyone who ever knew you is dead anyway. Geocities would be more interesting at that time, as a subject of study.

Library of Congress, British Library, Bibliothèque Nationale etc choose to save everything they are mandated to, and a fair bit extra besides. That includes everything published. They don't save their water cooler chats, personal letters and everything sent by post, everything said on the phone or Facebook, etc.

The bar - perhaps found accidentally - seems quite important in deciding what must be archived, and what probably shouldn't.

Archives of personal letters and ephemera, preserved in manuscript/special collections libraries, are incredibly important research sources. This often includes letters which were never meant to published. LOC had a project to preserve every tweet (published to the world) until a few years ago - who knows what tweets might be useful to future researchers?
And yet, hundreds of years later historians and linguists crave for letters, and post, and telegrams to get a glimpse of actual life outside official publications.
Sure, and a hundred or more years later the family of the author, or relatives of the recipient can decide to release the family letters or telegram from WW1 or the US Civil War etc. That delay, usually at least until the correspondents have died, is important. The affair, the less than ideal belief, and all that other imperfect demonstration of humanity can no longer hurt or embarrass. It ceases to be private and personal and moves into the historic.

Releasing whilst the probably famous sender is alive is most often in the realms of to do damage, simply tasteless or paid for revelations in the gutter press.

Are these EPROMs archived or available to play anywhere? You've got me curious!

(amateur digital archivist and data recovery hw/sw dev here, I find this stuff fascinating!)

> Leaving things behind and forgetting them is how you get Tulsa Oaklahoma, or the Armenian Genocide denials. We don't get to choose what the future finds interesting, and for the first time in history, we do not have to.

There is plenty of evidence for the Armenian genocide, the Holocaust, and 9/11. That doesn’t really stop deniers or conspiracy theorists. When it becomes politically advantageous, spreading misinformation becomes weaponized and mainstream. A bunch of nerds saving some ROM dumps isn’t going to really change that.

Like the library of Alexandria it’s also quite idealist to think archive.org will be around in 100 years or more. Not that we shouldn’t do it... but the future can be unkind to even all modern technology.

> We live in the only point in human history where we can actually save all of humanity's knowledge and culture,

Because the the welter of proprietary, undocumented formats, media bitrot and the like we are actually moving away from such a point.

Turns out historians may not be so upset. You can be a historian of early medieval France have a chance of reading 100% of the surviving documentation. Too much data can obscure the story.

Of course historically you could get a PhD for compiling a concordance to Shakespeare, something that can now be done mechanically in seconds. Future historians could (and will) apply the same tools to today's surviving documentation. But I don't believe there'll be as much of it as you seem to think.

>Because the the welter of proprietary, undocumented formats, media bitrot and the like we are actually moving away from such a point.

The best we can probably say is that it's different. We're capable of saving far more but, in practice, a lot of digital media is locked up in walled gardens and accounts that have to be paid for and require logins.

It's presumably easier to save a bunch of photographs or videos in a way that they'll be accessible so long as key Internet sites or their successors are. A fire or flood probably won't destroy them. OTOH, unless you've taken affirmative steps to upload that media to the right place, it won't be serendipitously discovered in a shoebox some day in the future.

> Turns out historians may not be so upset. You can be a historian of early medieval France have a chance of reading 100% of the surviving documentation. Too much data can obscure the story.

I don't understand this reasoning. Yes, more data = more work, but less data = more likely you're wrong.

Less data: less likelihood someone will challenge you before you get tenure.

The incentives in academia are messed up.

Is your entire original comment meant to be read as sarcasm then?

Or are you advocating for the idea that history is arbitrary and it’s better to just have a simple story than to have to worry about what really happened?

I’m having a hard time understanding what idea you’re trying to position in this debate.

I was making three points in the three paragraphs:

1 - it's more likely we will be an information-sparse region in the historical record rather than an information-dense region.

2 - professional historians have their own set of incentives which can be counterintuitive to the layperson.

3 - but indeed if there turns out to be a huge amount of stuff (there will likely be mountains of some forms of ephemera) to go through some people may be able to find value using new tools not available in the past to historians.

As someone trained as (but never worked as) a historian I do indeed have a bit of cynicism on point 2. I suspect most if not all actually working in that domain have the same cynicism.

Somehow "all available documentation about medieval France" somehow ended up being "oh, we don't know exactly, a lot is guesswork".

And that's France. What about Medieval Africa? The Americas?

Oh, we mostly burned that documentation because "it wasn't important". Right?

My understanding is that the jesuits burned the mesoamerican literature because they thought it was dangerous to their reign, not harmless. An appalling crime.
> You're right, let's burn the library down because one book has a liable chapter in it.

Jeez... You may want to re-read my comment. I have written no such thing, and it is not my opinion at all.

This comment would be better without the first three paragraphs, which actively detract from what you're trying to say.
> You're right, let's burn the library down because one book has a liable chapter in it.

I feel you got the comment backwards: a better analogy would be "if a used-books store full of Dan Browns were to burn down, would we regret the loss of maybe one chapter that has some value?"

Your position seems to be "yes", but I wouldn't dismiss so easily the opposite view: that 90% of everything is crap, and that keeping everything forever "just in case" sounds surprisingly similar to hoarding.

I do not oppose "purposeful archiving" - as someone mentioned, saving outgoing Wikipedia links seems smart. But my old twitter account, where I kept track of missed trains? There are better sources for that, and no one missed it when it was gone.

It's almost impossible to evaluate what is of lasting value in the moment, while it is readily available.

Imagine an author writes a paperback. It isn't very good, but a few people read it. Later, one of those people goes on to rework some of those ideas into their own script for a film. The film is a success. Years later, the scriptwriter mentions the paperback as an inspiration while giving an interview, but it's long out of print.

To a biographer or a devoted fan of the film, this forgotten book, while of little value in and of itself has become a valuable part of a larger story. If it were culled when the contents of that used book store burned down, we would have lost something without realizing it. And that's how we lose most things. The only way to minimize this is to store as much as we can, in the hopes that we may find a use someday, and thankfully digital storage has made this very, very cheap. The opportunity cost is tiny, and the potential reward, given enough time, is unbounded.

But the opportunity cost is not tiny. This is literally a twitter thread asking for financial support.

And I do recognize that the thousands of petabytes will likely be chump change to store in a decade... but necessarily the economies of storage will keep pace with the rate of content production. It will always be expensive to store everything.

The question is, do we gain back this investment from future uses of these archives? I dunno. I’d be interested to hear what value archivists have gotten out of the archives, given it is decently old already.

Like with other "90% of X is wasted" sayings, you don't know which 90% it is.

Even if you look at classical art with an honest eye, you can find plenty of works that in themselves are, well, crap - but they're being preserved and reproduced and talked about, because they acquired meaning over time. They've become relevant in context.

Take your old Twitter account. It's probably not interesting. It probably won't ever be. But it might. Imagine several decades from now, your great-granddaughter becomes a well-known, influential politician. This might retroactively and posthumously make you relevant, and in the process your Twitter account. Biographists might find it useful. Or independently, people who're into historical train schedules. Etc.

It's near-impossible to predict what the future will find relevant, so if storing some memories is nearly free on the margin - as it is today, with digital technologies - then just storing it is a no-brainer.

I think a better analogy than a library would be your average day in the office: would you want everything you say and do in the office recorded for eternity? Sure it would help, say, catch fraudsters, track responsibility and credit, allow sociologists fascinating analysis - but is that worth it? The >1GB of Google+ is a good example. Probably many interesting posts from people that are the core experts on topic X - and many nonsensical Twitter-like posts of people sharing whatever they encountered or thought that day.
>> We live in the only point in human history where we can actually save all of humanity's knowledge and culture

Playing devils advocate here for a moment. . .

Considering we as humans learn little from our past, keeping all of this knowledge is a benefit to whom then? Some people who feel nostalgic about Sony's first walkman? Or maybe people using it for nefarious reasons? If humans continue to make the same historical mistakes over and over, what benefit does the human race gain from cataloging all this information? I would venture to guess, its more plausible it will be used against us instead of furthering our own culture.

>> We know more about how Rembrandt painted and lived than we do about how Atari 2600 programmers worked and lived

There is a huge difference between saving all of Rembrandt's stuff than it is some 22 year old college drop out programmer who created a video game in the hey days of long forgotten startup company. And yeah, there have been numerous documentaries, and articles written about Atari in those early days. Who would want to save a dilapidated roller rink under the auspices that a great and noble video game company used it as their HQ for a few years??

https://www.polygon.com/2018/7/6/17542154/atari-book-valley-...

But then this roller rink down the block became available: 10,000 square feet! I mean, we were just jam-packed, and we had people on roller skates actually running around on the roller-skate rink building Pongs.

While I do think leaving certain things to the sands of time is a good thing, vacuuming up everything is just as worrisome. Are we going to be hoarders of a bygone technological past where a large majority of the "stuff" we save will have little, if any use to anybody anymore??

Having a background in anthropology, I find it fascinating there will be many generations of kids who leave no physical trace of their existence since a large majority will be in electronic form. Just imagine how people's lives are in a sort of suspended animation after passing away and having their Facebook pages live on forever.

It's a bit hard to wrap your head around tbh.

I would say there are several classifications of things worth saving through a broad net:

- kindling sources, like a LiveJournal post that inspired Lin Manuel to write Hamilton (for a fake example)

- early work of a future star, like imagine Lorde posted early songs to MySpace. This is already a clear issue as many posted songs have been deleted or lost for various reasons.

- valuable things on shaky ground. Yahoo Groups, for the latest example. But I just saw on Reddit someone was looking for a deleted scene from Blair Witch Project that was supposedly the first video ever published on Amazon Prime Video... and now it has nearly vanished. That seems crazy to me from so many directions.

- the value of the ephemeral. Gold and jewelry from old civilizations is nice but we know so much of how people actually lived by examining their garbage, scrap notes, broken bowls, etc.

- the myth of permanence. We feel like 10 million people see a video, it is probably preserved. But there are no master tapes of any of this and so much of everything is interlinked and hard to piece together after the fact. What were people's tech stack when they were making MySpace? How big were people's hard drives? Did rhey share sonngs theough Kazaa or play them on MySpace directly? What was the state of Javascript then, what were the security issues or underground trends? How did songs propogate, where were they shared? Were people sending links in email or AIM, were people sharing links on Digg? This is stuff from like a decade or two and already you need to think like an archaeologist to have any sense of how the culture really existed because there were so many moving parts from year to year.

- the value of datasets. Imagine putting some thought against the Geocities archive to see how HTML blink tags grew then fell in popularity over time. Or how a meme propagated, or analyze the link structure between groups of people or by topic or any make any number of interesting inquiries about how humans operate culturally in digital space and how interact socially through certain set of tools and limitations. There are very interesting possibilities here for understanding ourselves better as a species.

>You're right, let's burn the library down because one book has a liable chapter in it.

It is more like, either you burn the library down or every thing you have written in your private journal is now available to be checked out by anyone.

It really shouldn't be that way and I think we should fix the problem of holding people responsible for bad behavior in the past. But how do we draw lines (for example, what about holding people responsible for past crimes).

>We need to save our culture and digital heritage, else we forget where we come from.

I agree, but we also need to ensure this is done without costing individuals. Technology has advanced, but society has not. Out technology outpacing our culture has and will continue to hurt many people and we should try to find a way to fix it.

It is appalling to me that the parent comment is being downvoted. Religious fervor indeed. The point being made is simply that saving every scrap of history including personal tracking and details that are normally LOST to history, is a sea-change in human history and shouldn't be looked at lightly.

There is value in forgetting. What we forget, then is a very relevant question. "NEVER FORGET ANYTHING RAWR!" is not a useful point of view because it denies the very right to have a conversation on the subject.

If you can't agree that I should have some say in what is remembered (or at least archived) about me or generated by me, there's not much we can talk about.

It is also a matter of consent.

And to make it even clearer, one just needs to think about leaked images. Should we not allow a person to delete such images leaked without their consent?

1,000 years from now archaeologist may have some academic interest and those involved and even generations of their descendants are long dead.

But what about 1 year from now? Benefiting those 1000 years from now as the cost of those alive today is a hard position to justify, especially with such a blanket justification.

The Circle (both book and film) was, sadly, a largely botched attempt to explore issues like privacy, the power of tech companies, widespread surveillance, etc.

Missed opportunity in that the book was only readable if you took it as a deliberately over the top "if this goes on" fable. And the film was mostly notable for how it squandered a top-notch cast.

> Atari's old HQ is just another office building. The source code to those games is mostly gone (thankfully, it's assembly and easier to disassemble). We need to save our culture and digital heritage, else we forget where we come from.

Very good point!!!

In my view the Internet Archive should be the Digital equivalent of the the role of the National Register of Historic Places (NRHP). Shepherds of documentation, to give it a cool-ish sounding name.