| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pnash 2893 days ago

Nine years ago, my late wife had developed a tumor in her throat next to her vocal chords. She was fighting cancer while trying to be a mom to our 3 young boys. Directed radiation treatment was ruled out for this tumor, leaving surgery as the only viable option. The downside was the very real risk of her permanently losing her voice.

Hoping that she’d one day beat the cancer, but may not have a voice, I came up with an idea of trying to “capture it” in 2009 - hoping that it could be algorithmically rebuilt in the future. I reached out to a number of individuals that ultimately put me in touch with a research group that had a proprietary setup for capturing samples and rebuilding the voice. Over the Thanksgiving break, I managed to get access to a soundproof recording room and they worked with my wife to capture samples over a period of 4 hours.

Having worked in the infosec space since the 90s, my first reaction is often either how new tech/innovation can be used to bypass a control and how one could detect/prevent that. It’s easy to lose sight of how something like this could fundamentally changes a persons life.

2 comments

yomly 2893 days ago

This is a great post, although I am sorry for the experiences you went through to acquire this perspective.

Thinking more about the specific use-case you have in mind, I find myself wondering how sentiment and inflection might be captured via a synthetic voice. Would it be inferred by context? How would that inference deal with things like sarcasm/irony. I wonder if there could be some input mechanism for controlling the inflection - what would that input interface look like? Could it go off facial expression?

I wonder where the existing tech sits in the uncanny valley for this space...

link

asdf1011 2892 days ago

Take a listen to the samples from "Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis" by the Tacotron team. It's pretty compelling. https://google.github.io/tacotron/publications/speaker_adapt...

link