Hacker News new | ask | show | jobs
by Mandelmus 941 days ago
Can somebody who knows their stuff about voice synthesis explain to me why chatGPT sounds like it's speaking German with an American accent when I speak to it in German? It's really surprising how believably it sounds like a native American-speaker speaking excellent, just not accent-free, German.

I would imagine that voice synthesis models would somehow be trained on data from native speakers, so why the accent?

5 comments

I am going to guess that 99% of the source data, models, training, correction, verification is translated by Americans.

Translating a language is difficult on top of regional dialects creates additional complications.

If someone is talking long enough to me I can identify their birth State based upon their American accent. Every State has a different pronunciation of certain words which can leak location data.

Guessing you're Californian, educated parents, about 34 years old. No siblings.

Not sure how you arrived at that guess, but generally I guess that describes the largest demographic of HN users.

Without AI but based on some freely available statistics combined with post history, we can say definitively (assuming honesty on OP’s part) that OP is 35 or 36, and was either born in Germany or moved there when they were young, and they likely still live there or at least have a strong affinity to Germany.

We can reasonably speculate that the fact that they speak German and likely live there, as well as the standard of their English drastically increases the likelihood their parents were educated. Germany’s current total fertility rate is 1.5. It was likely higher when OP was born back in 1986/7 but I couldn’t immediately find any data on that and didn’t look too hard. Given the fact that TFR is a population mean, this suggests a high proportion of single-child families, and generally the more educated one is, the fewer children, so it’s a decent guess that OP is an only child (apparently more than 50% of German families have only one child), but I don’t see anything that would make this a sure fire bet.

I was able to complete some further analysis which could reveal more likely truths about OP such as gender, political orientation, sexuality, etc., but I think this goes far enough without starting to doxx them.

It’s a good question. I’m not sure of exactly the answer, but I suspect the answer is similar to the answer “why do Americans speak German with an American accent?”

If you learn how to pronounce specific vowels, consonants, etc. in a particular way, it takes a LOT of effort to learn how to pronounce these in a different way. You can approximate to a good extent, but most researchers say that if you don’t develop this skill as a child, you won’t ever be able to pronounce things in a way that sounds like a native speaker.

Presumably, the models have been exposed to significantly more American accents than other accents, and learning how to pronounce phonemes with subtle differences without accepting a “close enough” approximation is a big challenge, especially given that there is already a threshold for acceptability at which level you can still sound like a native English speaker.

I expect that the training data for the voice AI is overwhelmingly English. Native speakers of other languages will be a small minority. I'm sure it will improve over time.
> models would somehow be trained on data from native speakers, so why the accent?

you wouldn't believe it, but models haven't been trained yet. as usual.

The same thing happens with Polish, it's your typical Polish Chicago resident.
Can you make a compassion between polish / polish Chicago and any two US accents?
My immediate assumption was that Chicago has a high Polish population, not necessarily that it's any different from other Polish American ones.

edit: It would appear so:

> Chicago is a city sprawling with Polish culture, billing itself as the largest Polish city outside of Poland, with approximately 185,000 Polish speakers, making Polish the third most spoken language in Chicago.