Hacker News new | ask | show | jobs
by smarx007 1574 days ago
ArchiveTeam sends archives to Internet Archive but the two are not related. I don't think you confused the two but I mention this every time just in case.

The Warrior is a small Docker image that downloads files via your ISP connection and forwards them to the AT servers. No need for large drives.

For my personal use, I have a home server install of https://github.com/ArchiveBox/ArchiveBox and for that one you may want to get some storage, though I prefer to host its data on the SSD for performance reasons (my archive grows approx. 5000 items or 150GB per year). It's like a private Internet Archive on your home network.

2 comments

Thanks, it's always good to point that out.

There's a surprising amount of tools that are able to submit data to the internet archive (and get data from there). Even wget can produce WARC archive files.

While the warrior downloads content via your line (a bit like a residential proxy network), I do think it's important that we decentralize the storage as well.

Just without the crypto mafia/drug traders/investors.

AFAIK you can use IPFS (& clusters[0]) without relying on the crypto parts of that ecosystem. That ought to fit rather well with the use case.

[0] https://cluster.ipfs.io/

Yes there are some really interesting projects, also in the ML replicability space.

One really nice approach is the DAT project [1]. The protocol [2] looks pretty sensible and useful. Unfortunately, the tooling has been in such a state of permanent flux (i.e. perpetual deprecation) that I've never bothered to invest much time.

[1] https://datproject.org/

[1] https://datproject.org/

The last time I tried to do anything with or for the Archive Team, it was a mostly "just watch us work" sort of deal.

The tools couldn't be built without additional knowledge that wasn't published anywhere -- because there had been drift from what was published versus what was working, and those changes never got folded back in. And there were multiple versions and variants of the tools, with different teams using different versions or variants.

And once you built the tools, you couldn't get your Warrior into the list to be used, although you could always run your systems separately.

It's not like you could sign up for a SETI@Home type initiative and just let your equipment run.

I understand why they work this way. It's a very insular crowd, and new people and resources seem to disappear as quickly as they showed up.

So, they let you watch.

If you stay around long enough (months? years?), then they might let you start participating. But I wasn't willing to wait that long.