|
|
|
|
|
by GaggiX
660 days ago
|
|
This is just STT+LLM+TTS, GPT-4o voice mode that is being released uses a single model to listen and generate audio tokens, this allows a much better understanding of the environment (like understanding two people talking at the same time) and a much more powerful speech generation (like singing). |
|