| Post in r/datahoarder if you haven't already, folks there take data archiving pretty seriously. Hopefully you're already using Tor. I make the assumption that all the files you have are intended to be public. If they're not, only host and store encrypted versions using unique keys for each
file so that you have the option to provide keys on a need-to-know basis. I recommend having three tiers of storage (archive, online, and serving). Keep your primary backups on geographically diverse offline storage if possible; it should be enough to find a few people you trust in various countries to store complete copies since ~12TB hard drives are pretty affordable. Checksum everything, sign it, and make that signed list of checksums available for folks to verify that all their files are still intact. I don't have a ton of experience with using offline hard drives for longevity but I would expect that if everyone turns on their drive and verifies the whole archive once a year that you'll have a very low chance of losing any files. Check the Backblaze hard drive reliability posts for suggestions of good model numbers. Some individual disks will die but can be replaced and replicated from another source (probably 2nd tier). The goal of this tier is to not lose everything due to hacking or other attacks or disasters. The second tier is the online storage. Cloud bucket storage (AWS, GCS, B2) is expensive at $10/TB-month or more, but it is readily available globally and can be secured with pretty good access credentials. It is probably too expensive to serve from buckets directly because of out-bound cloud network pricing. Local online storage is also fine for this if it has a fast internet connection. The goal for this tier is rapid replication of data to either the 1st or 3rd tiers to recover from data loss or spin up new mirrors. I think for the third tier you should reach out to large CDNs and ask them to help host the large files as a public service to democracy. Failing that, setting up your torrent trackers and web servers on VPSs with rate-limits to avoid huge bills or getting kicked off the provider. Large public cloud instances are also a (expensive!) possibility but require hard-to-anonymize accounts and have pretty good abuse detection systems that will likely make it hard to repeatedly sign up for and host the same content again anonymously. Local/residential hosting in free countries on symmetric internet connections is also an option; plenty of people run tor exit nodes successfully and so you might be able to get enough people to run trackers from home. This is what will cost the most money and time, but it is worth having at least two active mirrors (vps hosts with copies of the website and files) at all times. Get a few domain names in different TLDs based in different countries, each with a different registrar. This makes it harder for all of them to be taken offline at once. Keep a list of working mirrors visible on each mirror so folks know where else you are hosting if a domain goes down, and point each domain's DNS to a mirror (all modern web servers support SNI for hosting multiple TLS certificates per IP). This buys very cheap redundancy; setting up fully redundant serving behind a single domain requires something like cloudflare or AWS/GCP/Azure load balancers or your own custom front-ends. You could use round robin DNS to point every domain at every mirror's IP, but when one mirror goes down a fraction of users will get a long timeout until they try a working IP. Keep your configuration files, scripts, web site source code, etc. in git or another form of version control, and make regular backups. Be careful to keep credentials out of version-controlled files. This makes it easier to spin up a new VPS web host whenever necessary, to track work done on the site, and to collaborate with other admins. Depending on the risk you perceive, if you can trust other admins, split administrative duties up between multiple people so that no one person has administrative control (including passwords, hardware tokens, email accounts, ssh keys, etc) over all the online resources. If you have enough trusted people then shift to a cell-structured network where not all admins know how to identify each other. Use hardware two-factor tokens wherever possible and watch out for targeted spearphishing attempts. Good luck! |