| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by echelon 899 days ago
	RVC is voice conversion (audio to audio), and it's typically finetuned. This is zero shot TTS. Samples create vector encodings that serve as input to inference. There's no retraining the model unless you want it to generalize or perform better.

1 comments

cchance 899 days ago

It isn't though, people need to read the paper and the comments from the author they aren't actually doing the voice generation they pass the text off to VITS, and then they're sauce is that they are doing tone mapping on that VITS output, so if anything they're a competitor to RVC, it's just that the version they published includes VITS also

link

echelon 899 days ago

Interesting.

Funny enough, a lot of RVC packages are using VITS to do RVC for TTS.

link