Hacker News new | ask | show | jobs
by sailingparrot 255 days ago
> This voice standardization model is an in-house accent-preserving voice conversion model.

Not sure this model works really well. As a french/spanish native speaker, I can immediately recognize an actual French or Spanish person speaking in english, but the examples here are completly foreign to me. If I had to guess where the "french" accent was from I would have guessed something like Nigeria. For example spanish have a very distinct way of pronouncing "r" in english that is just not present here. I would have been unable to correctly guess French or Spanish for the ~10 examples present in each language (mayyybe 1 for French).

2 comments

It's probably an artifact of them lumping together all varieties/dialects of a given language. I don't speak Spanish, but I know that the R is one of the things that's different in e.g. Argentina.
I wonder if they have a large population of African French speakers in the dataset?
For sure the voice standardization model is not perfect, but it was important for us to do especially for the voice privacy. It’s still pretty early tech.