| HN Mirror

AT are not affiliated with IA, but work closely with them, and the data AT collects is transferred to IA. Therefore, my characterisation of the data as "archived by the Internet Archive" is substantively accurate.

IA aren't scraping the data themselves, but they're the customer.

As for your second point: there's merit to that argument, and I've discussed same previously -- I'm very much of mixed minds on this. A few considerations weigh strongly, however.

0. The data were already public, as noted.

1. The system shutdown was not a known factor when most of the data were created. The expectation at that time was that the data would continue to exist.

2. The shutdown itself has occurred in a context in which individuals, and far more importantly groups quite literally could not archive the relevant data themselves. Google's own Data Takeout, whilst fairly remarkable (in a positive sense) within the industry makes many things difficult or impossible. Ordinary users cannot archive Community content, and even Owner and Moderator roles within communities could only archive posts from public communities -- neither comments nor private communities were archiveable. (Third-party tools could provide these capabilties). Moreover, technically, cost, bandwidth, or storage-constrained users or communities largely had no viable options for saving their own legacies.

3. The contents sitting on Google+, indexed and searchable by content on both the site and via the public Web, were far more visible than they will be at the Internet Archive, which does not support full-text search of its archives (at least not yet), and which is not as effectively indexed publicly as Google+ was.

4. The Internet Archive does provide for content removal under the DMCA, as well as other mechanisms. For a G+ user, given how content URLs are constructed (they all include the user's G+ UUID as a common element), requesting removal of an entire tree is trivial.

On balance, this favours the Archive.