Hacker News new | ask | show | jobs
by dpifke 698 days ago
From https://huggingface.co/datasets/mlfoundations/MINT-1T-HTML#l...:

We release MINT-1T under a CC-BY-4.0 license, designating it primarily as a research artifact. While the dataset is freely available, users are responsible for ensuring its legal use in commercial settings. Users must independently verify compliance with applicable laws before employing MINT-1T for commercial purposes.

Same page includes this caveat:

Potential Legal and Ethical Concerns: While efforts were made to respect robots.txt files and remove sensitive information, there may still be content that individuals did not explicitly consent to include.

1 comments

Ah yes, the "if you get busted for copyright violations it's not our problem" license.