Hacker News new | ask | show | jobs
by jamestimmins 769 days ago
With this capability, how close are y'all to it being able to listen to my pronunciation of a new language (e.g. Italian) and given specific feedback about how to pronounce it like a local?

Seems like these would be similar.

6 comments

It completely botched teaching someone to say “hello” in Chinese - it used the wrong tones and then incorrectly told them their pronunciation was good.
As for the Mandarin tones, the model might have mixed it up with the tones from a dialect like Cantonese. It’s interesting to discover how much difference a more specific prompt could make.
I don't know if my iOS app is using GPT-4o, but asking it to translate to Cantonese gives you gibberish. It gave me the correct characters, but the Jyutping was completely unrelated. Funny thing is that the model pronounced the incorrect Jyutping plus said the numbers (for the tones) out loud.
Not that different at all.
I think there is too much focus on tones in beginning Chinese. Yes, you should get them right, but no, you'll get better as long as you speak more, even if your tones are wrong at first. So rather than remember how to say fewer words with the right tones, you'll get farther if you can say more words with whatever tones you feel like applying. That "feeling" will just get better over time. Until then, you'll talk as good as a farmer coming in from the country side whose first language isn't mandarin.
I couldn’t disagree more. Everyone can understand some common tourist phrases without tones - and you will probably get a lot of positive feedback from Chinese people. It’s common to view a foreigner making an attempt at Mandarin (even a bad one) as a sign of respect.

But for conversation, you can’t speak Mandarin without using proper tones because you simply won’t be understood.

That really isn't true, or at least it isn't true with some practice. You don't have to consciously think about or learn tones, but you will eventually pick them anyways (tones are learned unconsciously via lots of practice trying to speak and be understood).

You can be perfectly understood if you don't speak broadcast Chinese. There are plenty of heavy accents to deal with anyways. Like Beijing 儿化 or the inability of southerners to pronounce sh very differently from s.

It was good of them to put in example failures.
I don't think that'd work without a dedicated startup behind it.

The first (and imo the main) hurdle is not reproduction, but just learning to hear the correct sounds. If you don't speak Hindi and are a native English speaker, this [1] is a good example. You can only work on nailing those consonants when they become as distinct to your ear as cUp and cAp are in English.

We can get by by falling back to context (it's unlikely someone would ask for a "shit of paper"!), but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears.

That's because we think we hear things as they are, but it's an illusion. Cup/cap distinction is as subtle to an Eastern European as Hindi consonants or Mandarin tones are to English speakers, because the set of meaningful sounds distinctions differs between languages. Relearning the phonetic system requires dedicated work (minimal pairs is one option) and learning enough phonetics to have the vocabulary to discuss sounds as they are. It's not enough to just give feedback.

[1]: https://www.youtube.com/watch?v=-I7iUUp-cX8

> but it's impossible to confidently reproduce the sounds unless they are already completely distinct in our heads/ears

interestingly, i think this isn't always true -- i was able to coach my native-spanish-speaking wife to correctly pronounce "v" vs "b" (both are just "b" in spanish, or at least her dialect) before she could hear the difference; later on she was developed the ability to hear it.

I had a similar experience learning Mandarin as a native English speaker in my late 30s. I learned to pronounce the ü sound (which doesn't exist in English) by getting feedback and instruction from a teacher about what mouth shape to use. And then I just memorized which words used it. It was maybe a year later before I started to be able to actually hear it as a distinct sound rather than perceiving it as some other vowel.
After watching the demo, my question isn't about how close it is to helping me learn a language, but about how close it is to being me in another language.

Even styles of thought might be different in other languages, so I don't say that lightly... (stay strong, Sapir-Wharf, stay strong ;)

I was conversing with it in Hinglish (A combination of Hindi and English) which folks in Urban India use and it was pretty on point apart from some use of esoteric hindi words but i think with right prompting we can fix that.
In the "Point and learn Spanish" video, when shown an Apple and a Banana, the AI said they were a Manzana (Apple) and a Pantalón (Pants).
No, I just watched it closely and it definitely said un platano
I re watched it a few times to ensure it said plátano before posting, and it honestly doesn't sound like it to me.
I'm a Spaniard and to my ears it clearly sounds like "Es una manzana y un plátano".

What's strange to me is that, as far as I know, "plátano" is only commonly used in Spain, but the accent of the AI voice didn't sound like it's from Spain. It sounds more like an American who speaks Spanish as a second language, and those folks typically speak some Mexican dialect of Spanish.

> "plátano" is only commonly used in Spain

The wiktionary page for "plátano" has a map illustrating how various Spanish-speaking countries refer to the banana.

https://en.wiktionary.org/wiki/pl%C3%A1tano#/media/File:Porp...

My principal association with plátano is plaintain, personally, but I am not a Spanish speaker.

I was about to comment the same thing about the accent. Even to my gringo ears, it sounds like an American speaking Spanish.

Plátano is commonly used for banana in Mexico, just bought some at a Soriana this weekend.

Interesting, I was reading some comments from Japanese users and they said the Japanese voice sounds like a (very good N1 level) foreigner speaking Japanese.
I thought "plátano" is only used for plantains in Latin America, and Cavendish is typically called "banana" instead. I'm likely wrong, though.
I'm from Colombia and mostly say "plátano".
Good to know. I thought Colombians said "banano". That's what a Colombian friend of mine says.
plátano is used in several Spanish-speaking countries, such as Mexico and Chile.
The italian output in the demo was really bad.
I'm a native Italian speaker, it wasn't too bad.
The content was correct but the pronunciation was awful. Now, good enough? For sure, but I would not be able to stand something talking like that all the time
Do you not have to work with non-native speakers of whatever language you use at work?
Most people don't, since you either speak with native speakers or you speak in English mostly, since in international teams you speak in English and not one of the native languages even if nobody speaks English natively. So it is rare to hear broken non-English.

And note that understanding broken language is a skill you have to train. If you aren't used to it then it is impossible to understand what they say. You might not have been in that situation if you are an English speaker since you are so used to broken English, but it happens a lot for others.

Why would you say "really bad"?
It doesn't have hands.
"I Have No Hands But I Must Scream" -Italian Ellison
This was the best joke I’ve heard this year.
So good!
Joke of the day right there :-)
Which video title is this?
Found it in a reel, I’m guessing it’s in the keynote: https://www.instagram.com/reel/C662vlHsGyx/

The Italian sounded good to me.

It sounds like a generic Eastern European who has learned some Italian. The girl in the clip did not sound native Italian either (or she has an accent that I have never heard in my life).
also wondering.
Shared in a reply to my comment.