|
|
|
|
|
by audiohermit
2202 days ago
|
|
I work in pathological speech processing/synthesis so I'm unfortunately familiar with your father's position. It really sucks that these people didn't know that archiving their voice would've been useful. I hear snippets that people manage to glean from family videos right after listening to their current voices and it makes me really sad. On the upside, your father can choose any celebrity he wants to voice him! Tons of celeb data is publicly available (VoxCeleb 1 & 2). |
|
Something like: - Download these texts - Record in WAV at least 48 kHz - Record each line in a separate file. - Do 3 takes of each line: flat, happy, despair
Maybe even a minimal set and a full set depending on how much effort you are willing to put in.
A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.
I have myself looked into this (for fun) but I felt I needed a very good understanding of the toolkits before even starting to feed in data. And for my admittedly unimportant use it seemed a huge investment to create a corpus I was not even confident would work. I ended up taking the low road and used an existing voice.