Hacker News new | ask | show | jobs
by TazeTSchnitzel 4010 days ago
Why aren't you mirroring the binaries? These are vital for people in the future who do not have the time to set up a build environment for software from a decade ago.

I'd also echo the concerns of others about GitHub.

Proper archivists should do for SourceForge what they did for other projects. Archive Team, maybe? Looks like they have a wiki page: http://www.archiveteam.org/index.php?title=SourceForge

2 comments

This was in progress, 830GB was downloaded before a Sourceforge guy popped onto the IRC and said he's ok with the archiving, but that the robots.txt should be respected. This would put things at a practical standstill. So the downloading was paused, I'm not really sure what's happened in the week since.

Right now Xfire's videos, several URL shortners' links, and Toshiba Support material are being archived. If you have spare cycles and bandwidth, and want to contribute, running an instance of the "ArchiveTeam Warrior" is pretty easy through docker or a VM. http://archiveteam.org/index.php?title=Warrior

Honestly I think ignoring robots.txt in this case is acceptable. Even if he programs in code to respect robots.txt - once the management at sourceforge get wind of what he is doing - what is stopping sourceforge from putting up robots.txt everywhere blocking him?
Look at their current robots.txt; they're already prohibiting robots to crawl the actual source code: http://sourceforge.net/robots.txt
Sourceforge doesn't host the binaries themselves. Universities and others offer mirrors (like HEANET) for free!

So the mirrors should just cut the upload write permission for Sourceforge and transfer it over to archive.org or ArchiveTeam.

Regarding binaries, I know these could be useful and I'd like to provide them, but I'm afraid some "not (yet?) very popular mirroring project" can't show how we can trust it regarding binaries. After all, a known site like SF is untrustable, so why would an unknown site would be more?
Yes, this is a more challenging and potentially risky one.

I think you're taking the right approach by capturing the code and the history. In fact, I think you're going above and beyond what most people should ask for or expect.

Seems you could just side-step SF directly in this case and contact one of their mirrors:

http://sourceforge.net/p/forge/documentation/Join%20as%20a%2...

http://sourceforge.net/p/forge/documentation/Mirrors/

I bet at least one techie working at one of those organisations would lend a sympathetic ear to the effort, if you could find them

edit: running 'rsync -r' on a local mirror shows 512,000 directories from a..ju, but only 43k files. Mirroring all the downloads should be easy