|
|
|
|
|
by echelon
899 days ago
|
|
RVC is voice conversion (audio to audio), and it's typically finetuned. This is zero shot TTS. Samples create vector encodings that serve as input to inference. There's no retraining the model unless you want it to generalize or perform better. |
|