Hacker News new | ask | show | jobs
by bunderbunder 2632 days ago
I'm less worried about the curiosity of people decades or centuries in the future than I am in the privacy interests of former Google+ users right now. Many of them chose that social network specifically because it was supposed to offer them better privacy protections. Which isn't a litmus test for whether or not they'd be OK with having their profile scraped and archived, but it is at least suggestive.

It would be polite if ArchiveTeam were to now contact everyone whose profile they have scraped, and ask for permission to retain that data. And then delete all the profiles for which they didn't get affirmative consent.

2 comments

The content IA are archiving was public and publicly accessible.
Well, for one, ArchiveTeam is unaffiliated with Internet Archive. Judging by the website, it's more closely associated with 4chan or Encyclopedia Dramatica or something.

For two, being public and publicly accessible doesn't mean it isn't gauche to scrape it. It's kind of like how nobody sticks a sign that says "take one only" over the bowl of mints at a restaurant; it's assumed you just know that it's not cool to stick a whole handful of them into your pocket.

AT are not affiliated with IA, but work closely with them, and the data AT collects is transferred to IA. Therefore, my characterisation of the data as "archived by the Internet Archive" is substantively accurate.

IA aren't scraping the data themselves, but they're the customer.

As for your second point: there's merit to that argument, and I've discussed same previously -- I'm very much of mixed minds on this. A few considerations weigh strongly, however.

0. The data were already public, as noted.

1. The system shutdown was not a known factor when most of the data were created. The expectation at that time was that the data would continue to exist.

2. The shutdown itself has occurred in a context in which individuals, and far more importantly groups quite literally could not archive the relevant data themselves. Google's own Data Takeout, whilst fairly remarkable (in a positive sense) within the industry makes many things difficult or impossible. Ordinary users cannot archive Community content, and even Owner and Moderator roles within communities could only archive posts from public communities -- neither comments nor private communities were archiveable. (Third-party tools could provide these capabilties). Moreover, technically, cost, bandwidth, or storage-constrained users or communities largely had no viable options for saving their own legacies.

3. The contents sitting on Google+, indexed and searchable by content on both the site and via the public Web, were far more visible than they will be at the Internet Archive, which does not support full-text search of its archives (at least not yet), and which is not as effectively indexed publicly as Google+ was.

4. The Internet Archive does provide for content removal under the DMCA, as well as other mechanisms. For a G+ user, given how content URLs are constructed (they all include the user's G+ UUID as a common element), requesting removal of an entire tree is trivial.

On balance, this favours the Archive.

I can't feel sorry for someone who goes to Google for better privacy protections. I mean, really?