|
|
|
|
|
by mitchsayre
37 days ago
|
|
Just emailed you but I'll reply here as well in case anyone comes across this thread and finds it useful later. -TTS: I am actively working on this at Wfloat and just released a 30M param model with 20 voices, emotion, and intensity control that supports running on even legacy 2017 phones.
-ASR: I think this is relatively in a good spot, the current ones small enough to fit on-device just mess up more at transcribing
-LLM: For sure the main bottleneck. I know a bunch of people are working on this one. The problem with LLMs is just that they have to be so big to actually know how to do anything. |
|