|
|
|
|
|
by reissbaker
595 days ago
|
|
This is really cool. FWIW, existing open-source TTS engines are really bad in comparison to what you have here: I know this is voice-to-voice, but I think there'd be a lot of appetite to get this to also be multimodal and accept text (essentially making it a really good TTS model, in addition to a great voice-to-voice model). I suppose someone could hack their way around the problem by finetuning it to essentially replay Piper (or whatever) output, only with more natural prosody and intonation. And then have the text LLM pipe to Piper, and Piper pipe to Hertz-dev. But it would be pretty useful to have it accept text natively! |
|