Hacker News new | ask | show | jobs
Ask HN: AI that allows you to make phone calls in a language you don't speak?
22 points by VictorPenJust 894 days ago
Imagine this: You're trying to book a table at a sushi restaurant in Tokyo over the phone, but you don’t speak Japanese. With this hypothetical software, you could make the call in English, and the software would translate and synthesize your voice into Japanese in real-time. So, you would speak in English, and the restaurant would hear everything in Japanese.

This idea came to me during my travels in Japan and China, where English is not commonly spoken in many places. It's incredibly challenging to navigate without knowing the language. We often had to rely on our hotel's front desk to assist us with reservations and contacting support services.

I can envision other applications for this technology, such as in call centers, for international business calls, meetings, etc.

What do you guys think? Thanks in advance!

12 comments

About six years ago I saw a tourist ordering in a restaurant by writing on his smartphone, translating, and using a classic text-to-speech function while showing the screen to the waitress, and it worked him pretty well for ordering. Since then the people I saw using it increased. Common sentences are used.

The problem would be in a conversation between two persons. Nowadays automated text translations are not reliable, can introduce even opposite meanings and they are not aware of nuances; it needs active supervision at this moment (and the following years).

With voice, a time-delay is needed for to acquire sentence context if the sources and target languages share structures, or a mandatory time-delay when the language structures are different, and also a general time-delay would be recommended for to avoid the interlocutors to listen two voices at same time. I'm not sure the real-time can be done like the ones we see in Star Trek (with voice to voice at least).

Important note: I would not recommend to popularize the synthesis of our personal voices. Variations from some reference models would be much better.

Samsung has already announced that live translation of calls will be coming to their next phones:

> AI Live Translate Call will soon give users with the latest Galaxy AI phone a personal translator whenever they need it. Because it’s integrated into the native call feature, the hassle of having to use third-party apps is gone. Audio and text translations will appear in real-time as you speak, making calling someone who speaks another language about as simple as turning on closed captions when you stream a show. Because it’s on-device Galaxy AI, you can trust that no matter the scenario, private conversations never leave your phone.

https://news.samsung.com/global/a-new-era-of-galaxy-ai-is-co...

Knowing Samsung software, I would temper expectations a little.
The thing I miss the most after switching from Samsung to Redmi phone is the One Hand Plus app that Samsung has on their galaxy phones. It was really a treat and a productivity boost. Don't know much about their other software offerings.
> and the software would translate ... in real-time

Depending on the pair of languages being translated between, isn't this literally impossible? The ordering of sentence parts is not the same in all languages (coincidentally, Japanese and English are a perfect example here of how different grammar can be), so you often have to wait until you've heard the whole sentence before you can parse it translate into another language.

Given the above issue, how is what you're envisioning any better than just using Google Translate?

In Arabic there is no word for "uncle". There is only "dad's brother" and "mom's brother". Often times an English speaker will leave out the paternal/maternal portion and the Arabic sentence may be ambiguous. If the English speaker later specified the needed information and the Arabic translator guessed wrong then there's a confusing problem here and the Arabic listener might get confused.
Is there no structure to express a parent's (or father's or mother's) brother? It may not flow well but avoids making incorrect assumptions.
In my language, there are two different words for Maternal uncle and Paternal uncle, so translating from English without any context might completely change the relationship.
This is the kind of failure of expectations that Star Trek’s universal translator has set us up for.

As far as I know you’re right, a number of things either don’t translate, or are too contextual for real time translation.

I’d love to be proven wrong, but I think the underlying problem here isn’t necessarily the language itself, but differences in underlying mental models that the languages express.

> This idea came to me during my travels in Japan and China, where English is not commonly spoken in many places.

I got the same idea, but from Douglas Adams ;)

Babel fish
you can do that with whisper https://github.com/openai/whisper even there is fast whisper runs like a charm on my old 2012 imac on cpu https://github.com/FamousDirector/FastWhisper
Extra fun to make a loop English -> Foreign language -> English -> ...
Whisper is capable to perform real-time translation?
Only from some other languages into English, not from English into other languages.

Also, it doesn't do speech synthesis (i.e. text to voice).

So, mostly no.

I’ve been keeping my eye on Seamless M4T streaming project from Meta. Although I haven’t gotten it to run locally yet (mostly due to lack of time), I think it has the potential to allow things like real time phone calls. My end goal is to have system level real-time translated transcriptions (for video conferences, etc).
Pixel buds and phone let you do this for face to face conversations, but I don't know how well it works: https://www.technologyreview.com/technology/babel-fish-earbu...
not a wicked googly jimmy cricket concept. web search engine terms lilliputing "pocket translater" shows quite a few options.

  https://itranslate.com/features/camera-translations

  https://blog.google/products/translate/see-world-in-your-lan...

  https://translate.google.com/about/
Didn't Google showcase this feature a couple years ago? ofc it's Google, so who knows if it ever got into production.
They did during Google I/O in like 2018. Voice AI was supposed to make calls and book meetings to a hairdresser. They never released it. It was there only to bump stock prices
Samsung will beat the Apple. good for them.
This will be the worst. We get enough spam calls as it is. Soon we’ll stop being able to use “broken English” as a clue that it’s a scam and even hearing our parents voices won’t be evidence it’s a real call
Just look at it like the dam has already broke, but the wave of destruction isn't here yet. Also, the voice will sound like your grandparents or uncle, probably because they used some malware app on their phone that stole their voice imprint and sold it off.
I was about to comment that this isn't AI, then realized that it's a pointless distinction now. Whether something is AGI is still meaningful to know when we got there, until then all AI/ML-tech will simply either keep being called AI or computer, agent/assistent, or whatever.