| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by ajaynraj 1182 days ago
	The short answer is that everything is streaming — as tokens come back from ChatGPT we send them as soon as possible to the synthesizer. The long answer is found in our code[0] :). [0] https://github.com/vocodedev/vocode-python/blob/main/vocode/...

1 comments

how is it sounding good though. usually text to speech models need the full context to sound reasonable.

We chunk it up per sentence so it has some context!