Hacker News new | ask | show | jobs
by pabs3 1968 days ago
Any idea about the license for the original data?
4 comments

The paper links to the McGill TSP speech database (English & French) as one of the sources of the data, which claims to be BSD licensed:

http://www-mmsp.ece.mcgill.ca/Documents/Data/

The other source of data mentioned in the paper is the NTT Multi-Lingual Speech Database for Telephonometry, which seems to be commercial, so presumably under a proprietary license.

https://www.ntt-at.com/product/multilingual/ https://www.ntt-at.com/product/speech2002/

Hmm, OTOH, the 6.4GB data tarball says that it is from contributors who responded to the demo and is licensed under CC0.
+1, that data is CC0, and I believe that's all the data that was used for training.
No, exactly none of that data was used for training. The training was done before the demo that was asking for noise contributions. The contributions are CC0, but were never used (i.e. totally unknown dataset quality).
So far we have 3 ideas!