| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mitchsayre 84 days ago
	Just emailed you but I'll reply here as well in case anyone comes across this thread and finds it useful later. -TTS: I am actively working on this at Wfloat and just released a 30M param model with 20 voices, emotion, and intensity control that supports running on even legacy 2017 phones. -ASR: I think this is relatively in a good spot, the current ones small enough to fit on-device just mess up more at transcribing -LLM: For sure the main bottleneck. I know a bunch of people are working on this one. The problem with LLMs is just that they have to be so big to actually know how to do anything.