Hacker News new | ask | show | jobs
by mitchsayre 37 days ago
Just emailed you but I'll reply here as well in case anyone comes across this thread and finds it useful later.

-TTS: I am actively working on this at Wfloat and just released a 30M param model with 20 voices, emotion, and intensity control that supports running on even legacy 2017 phones. -ASR: I think this is relatively in a good spot, the current ones small enough to fit on-device just mess up more at transcribing -LLM: For sure the main bottleneck. I know a bunch of people are working on this one. The problem with LLMs is just that they have to be so big to actually know how to do anything.