| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by pzo 450 days ago

> low algorithmic latency of just 15 milliseconds

I guess overall latency will be higher since processing will have to go to their server than back to our server then back from our server to STT provider and back to as then back to LLM provider and back to us and last part to TTS provider and back to the user.

It's so weird that e.g. OpenAI doesn't provide a way to make very simple voice pipeline STT + LLM + TTS executed totally on their servers, this would reduce latency significantly.

The pipeline with this server side audio processing right now looks mostly would have to look like that:

user phone -> our server -> krisp server -> our server -> OpenAI STT -> our server -> OpenAI LLM -> our server -> OpenAI TTS -> our server -> back to user.

Then you have to hope that user and all servers are hosten in the same region

1 comments

edmundsauto 450 days ago

It depends on what you're optimizing against, I think. I'd guess OpenAI prioritizes retention, so they don't give up the recent massive user base growth. If you retain users, you get them to continue using the product (data flywheel), and you can iterate fast enough - you will eventually be successful.

link