Hacker News new | ask | show | jobs
by daanzu 2201 days ago
I wrote a simple little Python GUI app to record training audio. Given a text file containing prompts, it will choose a random selection and ordering of them, display them to be dictated by the user, and record the dictation audio and metadata to a .wav file and recorder.tsv file respectively. You can select a previous recording to play it back, delete it, and/or re-record it. It comes with a few selections of sentences designed to cover a broad diverse range of English (Arctic, TIMIT). Pretty simple and no-nonsense.

https://github.com/daanzu/speech-training-recorder

Originally intended for recording data for training speech recognition models [0], it should work just as well for recording to be used for speech synthesis.

[0] https://github.com/daanzu/kaldi-active-grammar

1 comments

Did you figure out why half of Shervin’s audio was empty? I would hesitate to recommend this if there’s still a chance half of the data isn’t usable after recording.