| Full Disclosure: I'm a ML Team Lead at DagsHub. TL;DR
DagsHub lets you stream datasets from any repo you can access for free. We have open-source datasets for various tasks and domains (image, video, MRI, audio, etc.) that you can use, or upload yours and stream it.
Learn more - https://dagshub.com/docs/feature_guide/direct_data_access/ How does it work?
Every DagsHub repo comes with a configured remote storage, where users can host models, datasets, or any other large file. We recently added a new capability to our open-source client and free-to-use API that enables the streaming of files stored on DagsHub Storage. It enables access to any dataset stored on DagsHub, stream it to your machine, version it, and upload it to your DagsHub repo - all from your python code. I think the coolest part of this feature is that it doesn't require any modifications to your code base or data format. You can find more info about it here -> https://dagshub.com/docs/feature_guide/direct_data_access/ Feel free to reach out if you have any other questions (: |