|
Imagine this: You're trying to book a table at a sushi restaurant in Tokyo over the phone, but you don’t speak Japanese. With this hypothetical software, you could make the call in English, and the software would translate and synthesize your voice into Japanese in real-time. So, you would speak in English, and the restaurant would hear everything in Japanese. This idea came to me during my travels in Japan and China, where English is not commonly spoken in many places. It's incredibly challenging to navigate without knowing the language. We often had to rely on our hotel's front desk to assist us with reservations and contacting support services. I can envision other applications for this technology, such as in call centers, for international business calls, meetings, etc. What do you guys think? Thanks in advance! |
The problem would be in a conversation between two persons. Nowadays automated text translations are not reliable, can introduce even opposite meanings and they are not aware of nuances; it needs active supervision at this moment (and the following years).
With voice, a time-delay is needed for to acquire sentence context if the sources and target languages share structures, or a mandatory time-delay when the language structures are different, and also a general time-delay would be recommended for to avoid the interlocutors to listen two voices at same time. I'm not sure the real-time can be done like the ones we see in Star Trek (with voice to voice at least).
Important note: I would not recommend to popularize the synthesis of our personal voices. Variations from some reference models would be much better.