| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by andrew-w 141 days ago
	Thanks for the feedback. The current avatars use a STT-LLM-TTS pipeline (rather than true speech-to-speech), which limits nuanced understanding of pronunciations. Speech-to-speech models should solve this problem. (The ones we've tried so far have counterintuitively not been fast enough.)