Hacker News new | ask | show | jobs
by est31 2638 days ago
> Now we are trying to keep every one of those insignificant things, for what purpose?

What seems unimportant and insignificant to us now might become very interesting in the future. People in the future might consider the arrival of the internet as the beginning of a new age. Furthermore, some things may seem obvious to a person born in our times, but might not be obvious to people born in the future.

So many works, including even movies from the 20th century are lost now. E.g. the original version of Metropolis used during its premiere screening. Diogenes wasn't just a guy living in the town square, he also created some written works. They are lost now.

5 comments

I had room fulls of boxes of things that I thought might one day be important, or significant to someone. Then I realized, when I die, all those things are just going to be thrown away or donated to a thrift store, and it's highly unlikely that anyone who buys any of that will find the same value in it that I thought it had. If I have created anything truly timeless and worthwhile, then it had better be something I created for a client and got paid for making, otherwise it's just going to probably end up in a garbage heap. So I made a decision to get rid of all of it. Time is too short to be held down by the (truly) endless possibilities of "what if" this or that thing ends up being useful to someone in the future.
Yeah, keeping physical things around does only make sense if it is actually valuable to someone. Archives only accept objects actually worthwhile of keeping. But it's different for digital content, as it's so much easier to store. At least for now.

Also, Archaeologists love the garbage dumps and cesspits of old towns, literally the places where people put their least valuable things, because they weren't raided by earlier visitors and they can derive so much info about it. And it's nicely stratified so it gives some rough chronology.

Another way to look at this is that the more you store, the more difficult it is for people to actually find the valuable parts of what you've stored. One of the fascinating aspects of the internet is that our internet lives diverge so thoroughly from our offline lives - so the data we're leaving for the future is arguably horribly unrepresentative.
> difficult it is for people to actually find the valuable parts

That's why you need to catalog stuff.

And if you have stored something, you can always get rid of it if you deem it to be unimportant, but if you haven't stored it, most times you can't get it back. Erring on the side of storing unimportant things is an important strategy to cope with that.

The thing with internet content is that it's indexable and it's possible for every person interested to check it out. If you have boxes stored in some shed - it's much less so.
I'm less worried about the curiosity of people decades or centuries in the future than I am in the privacy interests of former Google+ users right now. Many of them chose that social network specifically because it was supposed to offer them better privacy protections. Which isn't a litmus test for whether or not they'd be OK with having their profile scraped and archived, but it is at least suggestive.

It would be polite if ArchiveTeam were to now contact everyone whose profile they have scraped, and ask for permission to retain that data. And then delete all the profiles for which they didn't get affirmative consent.

The content IA are archiving was public and publicly accessible.
Well, for one, ArchiveTeam is unaffiliated with Internet Archive. Judging by the website, it's more closely associated with 4chan or Encyclopedia Dramatica or something.

For two, being public and publicly accessible doesn't mean it isn't gauche to scrape it. It's kind of like how nobody sticks a sign that says "take one only" over the bowl of mints at a restaurant; it's assumed you just know that it's not cool to stick a whole handful of them into your pocket.

AT are not affiliated with IA, but work closely with them, and the data AT collects is transferred to IA. Therefore, my characterisation of the data as "archived by the Internet Archive" is substantively accurate.

IA aren't scraping the data themselves, but they're the customer.

As for your second point: there's merit to that argument, and I've discussed same previously -- I'm very much of mixed minds on this. A few considerations weigh strongly, however.

0. The data were already public, as noted.

1. The system shutdown was not a known factor when most of the data were created. The expectation at that time was that the data would continue to exist.

2. The shutdown itself has occurred in a context in which individuals, and far more importantly groups quite literally could not archive the relevant data themselves. Google's own Data Takeout, whilst fairly remarkable (in a positive sense) within the industry makes many things difficult or impossible. Ordinary users cannot archive Community content, and even Owner and Moderator roles within communities could only archive posts from public communities -- neither comments nor private communities were archiveable. (Third-party tools could provide these capabilties). Moreover, technically, cost, bandwidth, or storage-constrained users or communities largely had no viable options for saving their own legacies.

3. The contents sitting on Google+, indexed and searchable by content on both the site and via the public Web, were far more visible than they will be at the Internet Archive, which does not support full-text search of its archives (at least not yet), and which is not as effectively indexed publicly as Google+ was.

4. The Internet Archive does provide for content removal under the DMCA, as well as other mechanisms. For a G+ user, given how content URLs are constructed (they all include the user's G+ UUID as a common element), requesting removal of an entire tree is trivial.

On balance, this favours the Archive.

I can't feel sorry for someone who goes to Google for better privacy protections. I mean, really?
It's ok to lose stuff. If we save everything, there will be too much stuff, only a fraction will ever be used. I'm not convinced of the value. Perhaps people should do their filtering, saving.
It's always surprising to see how much relatively recent stuff is lost on https://www.lostmediawiki.com/
> What seems unimportant and insignificant to us now might become very interesting in the future.

Often this argument is used to try to justify things as important as mass-surveillance and as small as logging in a software project. “Just collect and retain everything, who knows what we’ll actually need!” With GDPR and increasing focus on massive & intrusive data collection, I think this mindset is going to have to change. Before deciding to preserve or collect information (particularly information of a personal nature, like social media accounts), you should be prepared to justify the activity. “It might be useful one day, maybe” shouldn’t be good enough.