Hacker News new | ask | show | jobs
by echelon 899 days ago
RVC is voice conversion (audio to audio), and it's typically finetuned.

This is zero shot TTS. Samples create vector encodings that serve as input to inference. There's no retraining the model unless you want it to generalize or perform better.

1 comments

It isn't though, people need to read the paper and the comments from the author they aren't actually doing the voice generation they pass the text off to VITS, and then they're sauce is that they are doing tone mapping on that VITS output, so if anything they're a competitor to RVC, it's just that the version they published includes VITS also
Interesting.

Funny enough, a lot of RVC packages are using VITS to do RVC for TTS.