| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Cameron_D 4295 days ago

ArchiveTeam has been, first we were grabbing full pages and images and storing them, but wound up with IP bans (Not unexpected), so a couple of people went through and grabbed the first 500 million images directly from CloudFront, they're still sitting on that 55tb of data.

Following that TwitPic then removed all images from showing on their site and required signed requests to load images from CloudFront so the remaining 300m images can't be fetched yet.

Today TwitPic restored the images and such to their site so AT is stepping back, rewriting their scripts to properly grab pages/images/metadata and will start from the most recent image working backwards and properly store them/removing the earlier grabs as we replicate them.

In the end the data will probably reside in offline storage at the Internet Archive until something happens to the TwitPic site.

1 comments

toomuchtodo 4295 days ago

Props to the #quitpic team for working on this.

link