We release MINT-1T under a CC-BY-4.0 license, designating it primarily as a research artifact. While the dataset is freely available, users are responsible for ensuring its legal use in commercial settings. Users must independently verify compliance with applicable laws before employing MINT-1T for commercial purposes.
Same page includes this caveat:
Potential Legal and Ethical Concerns: While efforts were made to respect robots.txt files and remove sensitive information, there may still be content that individuals did not explicitly consent to include.
We release MINT-1T under a CC-BY-4.0 license, designating it primarily as a research artifact. While the dataset is freely available, users are responsible for ensuring its legal use in commercial settings. Users must independently verify compliance with applicable laws before employing MINT-1T for commercial purposes.
Same page includes this caveat:
Potential Legal and Ethical Concerns: While efforts were made to respect robots.txt files and remove sensitive information, there may still be content that individuals did not explicitly consent to include.