| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by GaggiX 660 days ago
	This is just STT+LLM+TTS, GPT-4o voice mode that is being released uses a single model to listen and generate audio tokens, this allows a much better understanding of the environment (like understanding two people talking at the same time) and a much more powerful speech generation (like singing).