|
|
|
|
|
by pzo
450 days ago
|
|
> low algorithmic latency of just 15 milliseconds I guess overall latency will be higher since processing will have to go to their server than back to our server then back from our server to STT provider and back to as then back to LLM provider and back to us and last part to TTS provider and back to the user. It's so weird that e.g. OpenAI doesn't provide a way to make very simple voice pipeline STT + LLM + TTS executed totally on their servers, this would reduce latency significantly. The pipeline with this server side audio processing right now looks mostly would have to look like that: user phone -> our server -> krisp server -> our server -> OpenAI STT -> our server -> OpenAI LLM -> our server -> OpenAI TTS -> our server -> back to user. Then you have to hope that user and all servers are hosten in the same region |
|