| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by pabs3 1968 days ago
	Any idea about the license for the original data?

4 comments

pabs3 1968 days ago

The paper links to the McGill TSP speech database (English & French) as one of the sources of the data, which claims to be BSD licensed:

http://www-mmsp.ece.mcgill.ca/Documents/Data/

link

pabs3 1968 days ago

The other source of data mentioned in the paper is the NTT Multi-Lingual Speech Database for Telephonometry, which seems to be commercial, so presumably under a proprietary license.

https://www.ntt-at.com/product/multilingual/ https://www.ntt-at.com/product/speech2002/

link

pabs3 1968 days ago

Hmm, OTOH, the 6.4GB data tarball says that it is from contributors who responded to the demo and is licensed under CC0.

link

ArsenArsen 1968 days ago

+1, that data is CC0, and I believe that's all the data that was used for training.

link

jmvalin 1968 days ago

No, exactly none of that data was used for training. The training was done before the demo that was asking for noise contributions. The contributions are CC0, but were never used (i.e. totally unknown dataset quality).

link

the-dude 1968 days ago

So far we have 3 ideas!

link