Hacker News new | ask | show | jobs
by drngdds 2376 days ago
Tumblr blogs often include people's names, faces, and/or details about their personal lives. That's very much personally identifiable information! And while they did post it publicly, they likely didn't do so with the intention of it being saved forever in a publicly available, easily searchable archive. This especially applies for porn blogs where people post their own original content.

There's certainly value in archiving social media but I think it has to be balanced against the harms, instead of defending the practice with literal religious fervor and dismissing all criticism out of hand.

3 comments

Was there something in particular I said that you felt was defending it with "literal religious fervor and dismissing all criticism out of hand", or were you referring to my grandparent? I don't think I dismissed anything out of hand, I specifically acknowledged both the value of ephemerality and the point that traditional libraries are curated.

I agree that there is a danger that people may not realize how public and permanent the things they published to Tumblr were, or how dangerous it can be to do so (and I downvoted a sibling comment dismissing this danger). However, I think you and I have different threat models.

In my mind, archiving PII that is intentionally published is not particularly harmful because most lay people do, in fact, understand that their avatar, username, and by default, posts are public on Tumblr. They have had the opportunity to remove that information this whole time, and they still do, Archive.org removes stuff if you ask them.

By contrast, lay people have no mental model for what kind of information is incidentally collected nor how dangerous or benign it is. Certainly, lay people also can and do misjudge how public and how dangerous the things they intentionally publish are, but the gap is far, far less than incidental information. "Would you tell a stranger this" or "would you write this on a bathroom wall" are decent heuristics: the only difference in danger between text written on a bathroom wall and written on Tumblr is due solely to the potentially wider reach and possibility of even going viral on Tumblr. (Photos, of course, can also subtly compromise privacy in ways surprising to a lay person, but the gap is still much smaller than incidental information.)

In my threat model, that gap in understanding is much, much more dangerous than the intrinsic danger of PII. That's why I think that as long as Archive.org has a usable removal process, I think pretty much all the danger is in surveillance capitalism's collection of incidental information, not Archive.org's permanent record of intentionally publicized information.

The reason we fight against censorship (which is what this debate comes down to) with literal religious fervor is because that's how the other side fights for it.

Don't want it archived forever? Don't put it on the Internet. Seems simple enough.

If Archive.org had your attitude, I would actively oppose it. Removing private, personal info is not censorship. And nothing about "just don't put it on the Internet" is simple. What if someone hacked your devices and then put it on the Internet for lolz? What if you shared it in confidence with someone you trusted, who is intentionally putting it on the Internet to hurt you? What if you accidentally pasted the wrong thing or uploaded the wrong file? What if you were a child and didn't understand the dangers?

There obviously should be ways to ameliorate your mistake, which is why it is absolutely critical that Archive.org has a removal process.

Many people writing personal diaries/letters probably didn't do so with the intention of it being saved forever in a publicly available, easily searchable archive.

Yet such data is invaluable to historians and can give us a window in time through the eyes of people who lived that time. Having that publicly available data lost for all time would be an immense loss to future generations.

I'm sure in a few generations, some historians will study those archived porn blogs and get an insight on the evolution of humans' relations to sexuality that today's historians can only dream of.