Hacker News new | ask | show | jobs
by identity-haver 2637 days ago
There was a claim [1] that the G+ terms of service might legally prohibit them from doing this after the service is shut down. I haven't verified it.

However, it's clear that for an archiving effort this big, people at Google are explicitly allowing it. The user agents and fetch patterns of the Archive Team crawler were clearly distinct enough to get caught by an automated tool, and someone knew someone at Google in order to get it unblocked.

Unfortunately, any archival effort that requires the "Warrior" crawler (and not just a guy with a 4TB disk) is at the mercy of the website's remaining staff and management. Just ask Soundcloud. Archive Team started to archive their stuff when it looked like they were going to go under, but Soundcloud shut them down.

[1] https://news.ycombinator.com/item?id=19410050

1 comments

That's a really good point. OTOH, I think it would be nearly impossible for anyone to make a claim that their privacy has been violated by archiving public posts. In that case, rights have been granted to everyone (i.e. the rights Archive Team is currently exercising without issue) so limitations on rights granted to Google itself are irrelevant. OTOOH, IANAL. ;)