| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by rasz 1434 days ago
	Isnt that a response to companies buying old unused domains, slapping robots on it and thus killing whole archive of this domain going back 20 years?

1 comments

_ktx2 1434 days ago

Could be! However, making a direct attack on individual privacy should never have been an option. To make matters worse, the logic of, "We did this to government and military websites, so now we're going to roll it out everywhere" was quite broken for the time and remains so.

There's examples of how this works in a healthy way. Martin Manley is one scenario that comes to mind, where he overtly opted-in to having an archive stored about him upon his death: https://martin-manley.eprci.com/

gojomo 1434 days ago

Neiher 'flaunting privacy' nor 'direct attack on individual privacy' are fair descriptions of any of the Archive's web collection policies.

People who freely publish information, to the worldwide public, on the 'World Wide Web' should reasonably expect all sorts of entities to collect, save, analyze, & repurpose that info, unless they take specific steps to discourage such access & use.

The Archive's crawlers identify themselves, and collect things that are publicly linked, or specifically nominated-for-collection by library patrons or partners. Except in some focused specialized collection projects, they don't "log in" as any user, only visiting & collecting what's published freely to any anonymous person/organization/process.

For material needing more privacy, websites always have the option to block any and all unwanted visitors/crawlers with a wide variety of standard techniques, like requiring logins or simple challenges that automated crawlers won't pass.

And, as your linked articles report, the process for a later exclusion by request is pretty quick and simple. (The 2nd post concludes: "So, hats off to the Internet Archive for making the process smooth and relatively painless.") And, such exclusion does not require any sort of "DMCA request".

stubish 1433 days ago

This is victim blaming. In my jurisdiction, you retain copyright under any information you publish, even to the worldwide public. This means I can reasonably expect entities to collect, save, analyze and repurpose that info within reason, and without specific steps to discourage access & use. This is why there are laws such as 'fair use' and 'satire', because we wanted to extend what is considered reasonable use of public works. But redistributing copyrighted works without permission? Legally actionable, if you have the money and lawyers and access to the necessary courts. If this was software, such as free software license violations, people in this forum would be calling for the lawyers to nuke them from orbit.

Thankfully DMCA should make the removal process easier now, especially in situations where control over the domain has been lost or being hosted by a third party. Although last I saw there were still artificial barriers, such as needing to list every single individual page needing to be taken down. But this is after the fact, after you discovered your reasonable expectations and privacy have been violated. And then you have to track down the other copies that IA illegally distributed your now-private and copyrighted information to, such as a few libraries around the world with similar projects.

gojomo 1433 days ago

I'm talking about the unfair allegation of privacy violations, here.

Note that when the Archive shares crawled content with other libraries, those other libraries often have their own legal right to collect, preserve, and make-available that data even stronger than the Archive's rights via fair use, implied-license, library privileges, and other grounds. For example, many of the Archive's partners in government libraries, archives, & educational institutions have a statutory right & mission to collect copies of everything 'published', including via the world-wide-web, in their sphere of national interest.

As to what some unstated jurisdiction might consider "within reason", I prefer to think they'll find what's reasonable what I find to be reasonable – the IA's crawling policies – unless & until some actual governing authority finds otherwise in a clearly applicable/legible decision.

See my root post (ggggggp): in a vital, evolutionary, true-law-made-on-the-ground civilization, what actually winds up as "within reason" depends on the real implementations & multi-decade demonstrations of how things can beneficially work, as much or more than any copyright loyalist's strict reading of older statutory laws.

stubish 1432 days ago

Crawling and archiving everything, including personal writings, is a chilling effect. It is the same situation people are seeing with social media, where the past remains to haunt the present and none of our future leaders are using it without a mask. It was most surprising to people when some Libraries decided 'published' meant anything put on the WWW or posted to Usenet. It seemed grasp for funding and to keep relevant in an age where information was moving out of published media and into opinions virtually scrawled on a toilet door. The stuff I needed to get removed from the Australian National Library's archive is exactly the sort of stuff that shouldn't be in there, directly against the statutory rights and mission, and the sort of thing that could be pointed to when you wanted to defund the project. Because some twit thought meaningful Australian published materials meant anything under a .au top level domain, all the dross hoovered up by IA including all the stuff since removed because it is in nobodies interest or causing harm. And it was a pain in the arse.

gojomo 1431 days ago

I'm sorry you had some issues with the National Library of Australia's collections. I've never been an expert on Australian law, & it's been a while since (when I was at IA a decade+ ago) I worked with that library. But the impression I had at that time was that their governing law & budget, as dictated the Australian legislature, required them to collect broadly, & deeply, from the `.au` domain-names. So it seemed a compulsory part of their "statutory rights & mission" then, rather than "against" such things. Their governing laws & strategies may have modified over time since with experience – which is the point of trying, observing, correcting in new murkier frontiers of tradition, technology, and law.

On the larger issues, & specific to the Internet Archive:

You should assume there are several other larger "dark" web archives, by nations and large private organizations, collected without the awareness or available-remedies of the Internet Archive's or various national library public efforts. There are also uncountable other private and ad-hoc collections. Depending on what kinds of harms you expect from retained copies of older writings, these may be far larger threats than any holdings of an open, public, correctable non-profit library.

I would emphasize that anyone (like a web host or app) who gave any authors, especially the young & net-novices, the impression that something would stay private, or recallable, after being placed on a public webserver, at a published link, and open to browsing by all, did those authors a disservice by mis-informing them of risks, and the best-practices for preserving privacy.

That the Archive's well-identified, blockable crawlers sometimes surprise people with what they collect, and then make-available for lookup, helps correct that misunderstanding, both for individuals and the wider culture. Any "chilling effect" is unfortunate, but it's inherent to the web technology & practices of many independent actors. It's moreso documented, than created, by the Archive's own activities. And further – at least with respect to the Wayback Machine – the surprise availability is then fairly straightforward to undo, and prevent from recurring.

The broader risk that anything on the web – once offered to the public – will remain available from others persists no matter what the Archive does. Those concerned about such risks should take extra privacy-preserving steps, because blocking the Archive's crawls, or correcting the Wayback Machine, only limits this one polite, above-ground actor.

account42 1433 days ago

You are arguing about copyright in a thread discussing accusations of privacy violations.

_ktx2 1433 days ago

There is an overlap in the two. Copyright can be used as a defense against folk who believe, "Everything on the internet not behind authentication is commons". Often these folks point to books, magazines, etc in reference to their argument, which is certainly bad faith, but that's why copyright arguments come up.

A reference to one such comment in this thread: https://news.ycombinator.com/item?id=32150193

gojomo 1433 days ago

Wait, why are books, magazines, newspapers, newsletters, pamphlets, & flyers a bad faith analogy?

Those are exactly what hundreds-of-years of copyright law, by explicit statute and court interpretation, have addressed. The precedents for private actors, and especially noncommercial entities like libraries & schools, to retain those copies, and to a large extent, reshare/redisplay them, are very strong.

Further, by design, every delivery of content across the web necessarily creates copies at every network node, and perhaps multiple proxies/caches, on the way to the web browser. The web browser necessarily creates & displays a copy – and normally keeps one, at least for a little while for user convenience. Anyone choosing to core web protocols has already implicitly authorized lots of necessary copying.

Why wouldn't the recipients of such display-copies, and especially non-profit libraries, have on the web the same assumed right to keep/transfer/format-shift/redisplay that freely-delivered copy, in the same way they've always had the right to do with copyrighted books/magazines/newspapers/newsletters/pamphlets/flyers?

If copyright maximalists & DRM fans want a new right to remotely recall/destroy such copies – indefinitely, retroactively, and unlike the traditional copyright balancing-of-interests – they should make the case to lawmakers & courts for that, or use the technical measures already built-into the web for expressing such limits, and opting-out of the web's and copyright's defaults. You shouldn't let them simply assert that right without reasoning or a case for why it's better than tradition. Nor, allege criminality or 'bad-faith' against people just using the worldwide-web as it was designed, and enjoying readers' rights as they've been traditionally interpreted.

stubish 1432 days ago

Copyright is a mechanism used to protect privacy in these situations. When you don't have copyright, you are stuck needing a court to protect your privacy. Copyright is also what is required to prove in order to get stuff taken down by IA when the content is not obviously illegal or personally identifiable information (or at least it was when I last needed to deal with it).

daniel_reetz 1434 days ago

With respect, I fail to see how a public website is a privacy matter.

stubish 1433 days ago

Information on a public website is public until it is taken down or the information changed. The Internet Archive removes an individuals control over when the information remains public. This is privacy. We might be caught naked, and we can't unsee what has been seen, but it is a basic human instinct to draw the curtains and contain further damage. Perfectly innocent individuals suffer because the IA rules are designed around edge cases where public figures try to hide misdeeds.

account42 1433 days ago

If you print a magazine you also don't get to recall all copies if you change your mind about something. Giving individuals this kind of control over other's ability to freely share information is dangerous because it is easily abused to hide information that is in the public's interest and that is not an edge case at all. Making a decision to publish something on the public web is hardly analogous to being caught naked even if you may come to regret either.

If anything, the IA should be more reluctant to remove information without a court decision.

bakugo 1433 days ago

> The Internet Archive removes an individuals control over when the information remains public.

And that's a good thing in the vast majority of cases. Unless we're talking about sensitive information that was published without the consent of the person in question, all public information should remain public forever.

stubish 1432 days ago

In my experience, it is the vast minority of cases. Most of the content of the IA is not in the public interest, now or in the future. It is crap. It is noise. It is the contents of the Internet at a point in time. Actual information is the wheat in the chaff, and why you need search engines to find it. We know this, because of the Usenet archives that are intermittently available. Almost completely useless apart from people having a giggle at how the Internet used to be, a quick browse and search for naughty words. And a few gems in the mountain of noise, in such dire need of curation people hardly know it exists and barely justifiable enough for libraries to keep it alive.

gojomo 1431 days ago

Agreed, bulk collection gets dominated by crap, which individually has little value.

But there's some absolutely essential priceless diamonds hidden in the crap. And they can't be found/known at the time of collection: only with the future development of other events & knowledge do they become retroactively evident. So you've got to collect & preserve as much as you practically can, or else great things are lost forever.

Further, even the mounds/magnitudes of crap can turn out to be important for understanding the past. Ads that annoyed readers at the time help communicate how people, & businesses, & technology were really operating – not just the self-serving stories people craft later. The most-fumbling and awkward early uses of a new medium – hypertext, or RealAudio, or Shockwave Flash, or whatever – reveal enduring lessons about the evolution of technology & culture, including roads-not-taken that could still hold promise.

This shouldn't surprise us. Much of what we know of past civilizations comes from archeologists studying trash dumps that, via dumb luck, were well-preserved.

So if you tell me, "the Wayback Machine is a giant unedited trash heap of the internet", my response is: "Yes! That's the point! You get it!"

Kye 1433 days ago

Some people discover much too late that there are some things they wish they could take back. Often before trying to get a better job or when trying to escape an abuser. Given the ramping up of attacks (legal and otherwise) on queer people, this is going to be a huge issue over the next decade or so.