| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by clan 2202 days ago

Are there any simple howtos anywhere which describes the process in as simple terms as possible? Without knowing the cool toolkits du jour.

Something like: - Download these texts - Record in WAV at least 48 kHz - Record each line in a separate file. - Do 3 takes of each line: flat, happy, despair

Maybe even a minimal set and a full set depending on how much effort you are willing to put in.

A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.

I have myself looked into this (for fun) but I felt I needed a very good understanding of the toolkits before even starting to feed in data. And for my admittedly unimportant use it seemed a huge investment to create a corpus I was not even confident would work. I ended up taking the low road and used an existing voice.

2 comments

audiohermit 2201 days ago

Not really, this is the only thing I know of in terms of collection: https://www.isca-speech.org/archive/Interspeech_2018/pdfs/24... Usually you're basing your recipe off of those for existing datasets (TIMIT, WSJ, LibriSpeech, etc).

link

voicevoice50 2201 days ago

For recording training audio:

https://github.com/daanzu/speech-training-recorder

The recorder works with Python 3.6.10. Need to pip install webrtcvad also.

link