|
|
|
|
|
by gangster_dave
936 days ago
|
|
I don't think you can do that quite yet, since the TTS APIs require a full phrase in order to output fluent sounding speech. If the input is short, then the delivery/emotion/pauses are random per word/token. I actually think that type of system will be possible once we have a multimodal model that understands and outputs speech, with the intelligence of GPT4. |
|