Hacker News new | ask | show | jobs
by nickmcc 888 days ago
I was looking at video on training a custom voice with Piper, following a tutorial at https://www.youtube.com/watch?v=b_we_jma220, and noticed how the datasets required metadata of the text for the source audio files. This training method by Collabora seems to automate that process and only requires an audio file for training.
2 comments

Yup, we are using Whisper to transcribe automatically so we can train the model on just speech recordings, without human transcripts.

This works for any language that is well supported by the OpenAI Whisper model.

Where can we find the latest OpenAI language model rankings?
There is a plot of language performance on their repo: https://github.com/openai/whisper

I am not aware of a multi-lingual leaderboard for speech recognition models.

Whisper solves it, that’s its purpose.