| HN Mirror

Right.

Google acquired Kaggle in 2017, and also Appen acquired Figure Eight (formerly CrowdFlower) in 2019, both of which used to be open-source-friendly places to post datasets for useful comments/analyses/crowdsourced hacking, in general without heavy and restrictive license terms. (There is also still the UC Irvine Machine Learning Repository, https://archive.ics.uci.edu/). Kaggle still may be, just beware of the following:

Kaggle at some point began silently disappearing some (commercial) datasets from useful old competitions (such as dunnhumby's Shopping Challenge 2011 [0], even though it was anonymized and only had three features). So you can't rely on the more commercial datasets being around to cite and for replicability.

Also, according to [1] "you can be banned on Kaggle without any warnings or reasons, all your kernels and datasets will became inaccessible even for downloading for yourself and support will not answer you for weeks (if ever)". Usually IME I'd heard it's on (AI-based) suspicion of cheating (or using multiple accounts to bypass submission limits, or collusion between teams on submissions), or post-2018 gaming and account-warming/transfer to boost rankings. But the AI might do false-positives, and it's reportedly nearly impossible to reach live human support.

Kaggle added DOIs in 2019 [2], at least for academic datasets, not by default.

[0]: https://www.kaggle.com/c/dunnhumbychallenge

[1]: https://www.reddit.com/r/kaggle/comments/essuk1/reminder_you...

[2]: https://www.kaggle.com/discussions/product-feedback/108594