|
|
|
|
|
by tomp
640 days ago
|
|
The problem with all these speech-to-speech multi-modal models is that, if you wanna do anything other than just talk, you need transcription. So you're back at square one. Current AI (even GPT-4o) simply isn't capable enough to do useful stuff. You need to augment it somehow - either modularize it, or add RAG, or similar - and for all of those, you need the transcript. |
|
I am sympathetic to this view but strongly disagree that you need a transcript. Think about it a bit more!!