Hacker News new | ask | show | jobs
by ajaynraj 1182 days ago
The short answer is that everything is streaming — as tokens come back from ChatGPT we send them as soon as possible to the synthesizer. The long answer is found in our code[0] :).

[0] https://github.com/vocodedev/vocode-python/blob/main/vocode/...

1 comments

how is it sounding good though. usually text to speech models need the full context to sound reasonable.
We chunk it up per sentence so it has some context!