Hacker News new | ask | show | jobs
by pariahHN 2388 days ago
Even if we still had the Library of Alexandria, it may have shed zero light on the actual lives of citizens. Archiving content on the internet means capturing thousands of individual level perspectives and experiences. We don't know what will end up being important to historians 50 or 100 years from now. I would bet there are dozens if not hundreds of historians that would give anything for a record of their favorite time period that contains even a fraction of the amount of content today's archive efforts are storing.

It's also not horrendously expensive - we are getting better and better at storage as well data analysis techniques, so stuff that seems useless today may be useful 50 years from now and cost less to store than it does now. The key thing again being that we can't benefit from hindsight.

Even graffiti can give insight into a time period, even if that insight is that that time period had an unusually high number of graffiti artists.

4 comments

Not to mention that historians of the future will be able to sort and characterize massive amounts of data and draw conclusions that couldn't be made without that data.

For a time period where data is more valuable that oil, that the wealthiest companies are trying to grab every piece of data they can, and on a site where this is frequently discussed and many work for said companies, I find the question "why do archivists want to archive data?" a little silly. Date might not be useful to us now, but might be to future historians (though this is a similar argument made by that companies that do mass surveillance).

What about people who don't want stupid comments they made online when they were 14 permanently indexed and searchable for all of time by the Archive Team? Yes, they may have posted to Yahoo! Groups back in 1999 when they didn't know better, but now it's 2019 and you have people digging up decades-old dirt on people to try and destroy their reputations and careers.

Given that search engines have zero ethics when it comes to removing embarrassing (but not illegal) content, sometimes the loss of information is a small blessing for some.

Yes, it's their fault, but I also don't think it's fair that something a child said at 14 should haunt them their entire professional careers, either.

The stuff stored in the Yahoo groups is material from the beginning of the internet. When people explored what could be possible and how easy is was to connect globally. You have a valid point, but it's also one of these things in our generation that we have to live with. We explored and tried things. Only now we look back and see what those explorations of our younger selfes really are; sometimes funny, sometimes embarrassing. However, if you are cautious, you may be able to delete your stuff or at least make it anonymous by deleting that said account. If not, you have live with it. Those of all these people can now learn from it and can educate their kids in being careful with the internet. (Or at least this is what it should be)

The dogma, that "everything posted to the internet will stay on the internet" , may not be entirely true for this first generation, because now large parts are already gone. But I am certain that this will be very true for the current generation, because I really doubt that Facebook and others will ever freely delete large datasets of user content.

Given that search engines have zero ethics when it comes to removing embarrassing (but not illegal) content,

Ethics are about codified sets of rules. Perhaps they're just following a set of rules that doesn't promote hiding things to make people feel better?

The archives are not easily indexable by search engines, they're posted as multi-GB gzip-compressed WARC files.
But someone could hypothetically convert the WARC files back to static HTML and host them on the clear web.
Hypothetically, yes; but right now all this stuff is available on the clearnet and searchable. So obviously any potential harm of the present situation, is decreased. And, unless your argument is that we should delete all fora on the web because someone may have said something embarrassing on them, then I think you'd probably want to come down on the side of preservation.
I'm pretty sure Yahoo isn't doing this to protect people from their old posts.
IA are extremely responsive in delisting content on request.

Email info@archive.org

Withhold wide-scale, anonymous access for a few decades maybe? (Though presumably there is a middle ground that doesn't involving leaving _everything_ inaccessible for a few decades.)
For example: World War two groups where many of the the members have passed away by now. There could be first hand accounts of history that has already been lost to time.
Could?

More like definitely.

YES! It's like preserving ecological diversity. It's a store for later learning. Verizon is working in cold hard capitalism, and you can bet your lunch that they did NOT use Google Groups to hold their shared wisdom/history, and they would never let it be lost.

But many don't have the pockets for better systems, and so their earned knowledge lived on Google Groups. And when you think of all the people and groups that might have had needs to store their history, and what tools they might have used, what do you expect the skew of Yahoo Groups was. Certainly no Fortune 500 companies, but rather nonprofit and grassroots and all sorts of domains that are already getting the short end of the stick in our world :)

Heh *Yahoo Groups, that is