|
|
|
|
|
by gnat
703 days ago
|
|
From the poster: In this work, we propose a light-weight (~20M param.)
causal voice conversion solution that can run in real-time
with low latency on a commercially available mobile
device. The key design elements are: (1) using a causal
encoder to learn soft speech units; (2) injecting whitened
f0 to improve pitch stability without leaking source
speaker info. In our later V2 version, we found that f0 rescaling
followed by a NSF-style harmonic-plus-noise
conditioning (as is done in RVC) results in better quality. |
|