|
|
|
|
|
by Spiwux
915 days ago
|
|
I wonder if we're at a point where you could build a voice assistant like that, except almost-realtime and streamed end to end: User speaks and speech to text starts streaming text while the user is still speaking. That text stream is piped into a LLM, which also streams its output text. That output text is streamed to text-to-speech, which also generates audio in a streaming manner. |
|
The speech recognition part needs work for sure, but when it works you can see the potential. It's very different from the way it feels to talk to Siri or even ChatGPT's voice mode. It won't be long before we are having real conversations with our computers.