Hacker News new | ask | show | jobs
by punchingwater 2905 days ago
We also keep the README in the repo: https://github.com/mozilla/voice-web/blob/master/docs/corpus...
1 comments

Thanks! Couldn't find the source of the Readme in the zipfile. Can you talk about what the update process for this file is? How often is it updated? Is there a way to just download the new files? Is there a tarball script for this in the repo somewhere?

I see that you have instructions for s3, are the files actually backed in s3? Is it possible to download them with s3 (possibly using requester pays)?

We have no plans to allow users to download the "raw" data from s3 (ie. before we perform the train/dev/test split). But we want to eventually build some tools to automate this. See here for some background:

https://discourse.mozilla.org/t/the-mozilla-guarantee-publis...