Hacker News new | ask | show | jobs
by clan 2202 days ago
Are there any simple howtos anywhere which describes the process in as simple terms as possible? Without knowing the cool toolkits du jour.

Something like: - Download these texts - Record in WAV at least 48 kHz - Record each line in a separate file. - Do 3 takes of each line: flat, happy, despair

Maybe even a minimal set and a full set depending on how much effort you are willing to put in.

A plain description on how to capture a raw base which within reason and technology could be used as a baseline for the most common toolkits.

I have myself looked into this (for fun) but I felt I needed a very good understanding of the toolkits before even starting to feed in data. And for my admittedly unimportant use it seemed a huge investment to create a corpus I was not even confident would work. I ended up taking the low road and used an existing voice.

2 comments

Not really, this is the only thing I know of in terms of collection: https://www.isca-speech.org/archive/Interspeech_2018/pdfs/24... Usually you're basing your recipe off of those for existing datasets (TIMIT, WSJ, LibriSpeech, etc).
For recording training audio:

https://github.com/daanzu/speech-training-recorder

The recorder works with Python 3.6.10. Need to pip install webrtcvad also.