Y
Hacker News
new
|
ask
|
show
|
jobs
by
ajaynraj
1182 days ago
The short answer is that everything is streaming — as tokens come back from ChatGPT we send them as soon as possible to the synthesizer. The long answer is found in our code[0] :).
[0]
https://github.com/vocodedev/vocode-python/blob/main/vocode/...
1 comments
famouswaffles
1182 days ago
how is it sounding good though. usually text to speech models need the full context to sound reasonable.
link
KianHooshmand
1182 days ago
We chunk it up per sentence so it has some context!
link