| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gangster_dave 936 days ago
	I don't think you can do that quite yet, since the TTS APIs require a full phrase in order to output fluent sounding speech. If the input is short, then the delivery/emotion/pauses are random per word/token. I actually think that type of system will be possible once we have a multimodal model that understands and outputs speech, with the intelligence of GPT4.