Hacker News new | ask | show | jobs
by brianjking 698 days ago
What's the license though?
1 comments

From https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML#l...:

We release MINT-1T under a CC-BY-4.0 license, designating it primarily as a research artifact. While the dataset is freely available, users are responsible for ensuring its legal use in commercial settings. Users must independently verify compliance with applicable laws before employing MINT-1T for commercial purposes.

Same page includes this caveat:

Potential Legal and Ethical Concerns: While efforts were made to respect robots.txt files and remove sensitive information, there may still be content that individuals did not explicitly consent to include.

Ah yes, the "if you get busted for copyright violations it's not our problem" license.