|
|
|
|
|
by doctorpangloss
888 days ago
|
|
I believe your use should be protected. This is not meant to be a takedown, better you hear it from me though, because you'll never hear it from Discord. > "We are working only with properly licensed" ...versus: > "fair use" You're smart enough to know these are very different things - saying you believe you are protected by fair use, and claiming that the data is "properly licensed." In legal there is a colossal difference, you went from say you were authorized to use something to you believe you are unauthorized but still permitted due to fair use. |
|
Our model is not a derivative of Whisper but we do use the transcripts (and encoder outputs) in our data preprocessing. I am convinced this data cannot be regarded as a derivative of the (proprietary) Whisper training set. Whisper itself is open-source (MIT) and has no special TOS (unlike, say, ChatGPT).
WhisperSpeech itself is trained on mostly Public Domain and some CC-BY-SA recordings. We are working on adding more data and we will provide clearer documentation on all the licenses.