Hacker News new | ask | show | jobs
by bityard 477 days ago
> Reading and understanding text is much slower than just listening.

Speak for yourself! I read _much_ faster than listening to someone saying the same thing. This is why I can't stand subtitles on videos, movies, and tv shows. Because of how my brain works, I can't help but read the text. And when it's there, I'm done reading the person's line when they are only 25-50% through speaking it. So it "feels" like I'm watching a show where everyone repeats the last half of every sentence.

> Is real-time audio interpretation in the pipeline?

When I saw the headline, I assumed the product was doing real-time translation and voice cloning in one. Now _that_ would be an interesting use of AI. (Google and others have been doing real-time voice recognition and text translation for years.)

3 comments

> This is why I can't stand subtitles on videos, movies, and tv shows. Because of how my brain works, I can't help but read the text. And when it's there, so it "feels" like I'm watching a show where everyone repeats the last half of every sentence.

Ha! Translation is done in real time, but subtitles are not!! Were you thinking they are processed the same way? That's your confusion.

> I'm done reading the person's line when they are only 25-50% through speaking it.

How can an AI system translate someone when they haven't even spoken those words yet? Please check the title of the Post - it's a real-time system.

> I read _much_ faster than listening to someone saying the same thing.

Everyone reads and/or speaks at a different speed. You can pause a movie, but not a meeting, during the first time. You don't have to make any critical decisions while consuming entertainment, but on the contrary, at work, you might have to listen, process, understand, and connect the dots into various other subsystems and conclude how they may or may not affect your standing. At the end, might have to challenge the speaker or add to what they are saying. A lot of variables.

We are also excited about real-time translation + voice cloning (like having your K-pop stars speaking your language with their voices!) This is actually something we explored previously. The tech is there but we weren't sure of the the user experience, especially in terms of latency.

Maybe we'll have this for Cuckoo 2.0!

for this check www.palabra.ai