Hacker News new | ask | show | jobs
by dredmorbius 2632 days ago
Archive Team are absolute heros. The Google+ Mass Migration community learned of them and their "googleminus" project in January, and worked to help give information on the crawl, amount of data, and particulars of G+.

arkiver and Fusl in particular have been absolutely amazing in what they've accomplished.

They also managed to pull in 94.5% of all Google+ Communities, which should provide the ability to view posts by Community (they're otherwise scattered among user posts). We're still assessing how much of that was login-page redirects in the last hour or two of the crawl, but it's amazing work.

I'd managed to send of about 80k larger, recently-active (100+ members, <30 day activity) over the past few weeks, with a final grab about 18 hours ago, using the Internet Archive's "save" URL.

If you ever need to use that it's:

    https://web.archive.org/save/<URL>
Where you replace "<URL>" with whatever it is you're trying to save, including the protocol string, say, this HN post:

    https://web.archive.org/save/https://news.ycombinator.com/item?id=19556665
That can be scripted, and my submissions used a bog-simple Bash script and xargs to plow through 100k submissions (20k appear to have been dead) in about 90 minutes, on very modest hardware.

Also: the Internet Archive (and Archive Team) run off volunteers and donations. You can help, and please do.

https://archive.org/donate/

https://www.archiveteam.org/index.php?title=Donate

(Not affiliated, but very grateful to them.)