Hacker News new | ask | show | jobs
by nielsinho 1231 days ago
It uses the TorToiSe TTS model for generation. It's simple to generate conditioning voice latents using short audio samples. Likely transcribed JRE episodes were part of the TorToiSe training data, explaining how it's so good at recreating his voice characteristics in particular.