Hacker News new | ask | show | jobs
by bentpins 4010 days ago
This was in progress, 830GB was downloaded before a Sourceforge guy popped onto the IRC and said he's ok with the archiving, but that the robots.txt should be respected. This would put things at a practical standstill. So the downloading was paused, I'm not really sure what's happened in the week since.

Right now Xfire's videos, several URL shortners' links, and Toshiba Support material are being archived. If you have spare cycles and bandwidth, and want to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy through docker or a VM. http://archiveteam.org/index.php?title=Warrior

2 comments

Honestly I think ignoring robots.txt in this case is acceptable. Even if he programs in code to respect robots.txt - once the management at sourceforge get wind of what he is doing - what is stopping sourceforge from putting up robots.txt everywhere blocking him?
Look at their current robots.txt; they're already prohibiting robots to crawl the actual source code: http://sourceforge.net/robots.txt
Sourceforge doesn't host the binaries themselves. Universities and others offer mirrors (like HEANET) for free!

So the mirrors should just cut the upload write permission for Sourceforge and transfer it over to archive.org or ArchiveTeam.