Hacker News new | ask | show | jobs
by caetris1 2245 days ago
In no way do I mean to take away from the really great work of these researchers, but there is one thing here that people should be aware of. By using karaoke style lyrics, this scientific study invalidates itself and the credibility of those that went forward with publishing it. By reading the lyrics while listening to the audio, the brain will automatically convince the listener that the audio result is better than it is. What is the proof for this? Well, look no further than the infamous Yanny/Laurel audio clip. When you read the word "Yanny" or "Laurel" at the frame rate of the audio, your brain switches between two different auditory suggestions.

https://en.wikipedia.org/wiki/Yanny_or_Laurel

There is also a scientific precedence that refutes these findings, which is called the McGurk effect.

https://en.wikipedia.org/wiki/McGurk_effect

https://en.wikipedia.org/wiki/Speech_perception#Music-langua...

These researchers may not be to blame for this, but they really should have been honest in their conclusion.

1 comments

They concluded that their model "is capable of generating pieces that are multiple minutes long, and with recognizable singing in natural-sounding voices." Which part of that is dishonest? I would assert that being able to make sense of the lyrics is a nice bonus but not fundamentally relevant to their conclusion, in that a person can appreciate singing in a foreign language, and recognize it to be natural, without any knowledge of the words whatsoever. Besides speech synthesis in terms of intelligibility is basically solved, that's not really the thrust of what they've achieved here.

And more to the point, a full 815 of the uploaded songs have no pre-written lyrics, so your premise that they are reliant on "karaoke style lyrics" is mistaken to begin with.