|
Great to see there's some resistance. What I'm missing from this announcement though is any mention of how they intend to secure this "vault" against the current government. I'm assuming good intentions on the part of Harvard, but keeping this data online against the express will of the government is gonna cost (political) capital. And from what I can see, the archive is hosted by US entities on US-controlled servers on US soil? This is the same thing that's been bothering me with archive.org lately, by the way. I haven't found a good way to simply (for some reasonable definition definition of "simple") contribute 10 TiB or so of redundant storage on my (european) home server either. That kind of thing might (have to) serve to ensure tamper-resistance for that data, given the current political climate on both sides of the pond. Any pointers welcome. |
Maybe this?
> In addition to the data collection, we are releasing open source software and documentation for replicating our work and creating similar repositories. With these tools, we aim not only to preserve knowledge ourselves but also to empower others to save and access the data that matters to them.
https://github.com/harvard-lil/data-vault
And since the data lives here: https://source.coop/repositories/harvard-lil/gov-data/descri...
Combined with this:
> To download an individual dataset by name you can construct its URL, such as:
> https://source.coop/harvard-lil/gov-data/collections/data_go...
> https://source.coop/harvard-lil/gov-data/metadata/data_gov/f...
> To download large numbers of files, we recommend the aws or rclone command line tools:
> aws s3 cp s3://us-west-2.opendata.source.coop/harvard-lil/gov-data/collections/data_gov/<name>/v1.zip --no-sign-request
So one could "easily" mirror the whole thing, making it distributed.