Hacker News new | ask | show | jobs
by taejavu 3394 days ago
You've misunderstood what you're listening to, I suggest reading the post again.

The recordings at the bottom are just recordings of an old lady and a young woman.

1 comments

Yeah, I understood that. The ones in the middle are generated using their voices. You don't find that amazing?
I mean, it's sort of amazing, but it wasn't completely generated by machine. Those sound clips in the middle were generated by copying the inflections from actual recordings, not generating the inflections from scratch. It sounds like the current system they have sounds like the robotic voices at the very top.
It's not TEXT to speech, it's speech to speech. I think it would be amazing when we have TTS of that quality.