Hacker News new | ask | show | jobs
by ilaksh 481 days ago
I think part of the issue is for the latency to be as low as this they have to tune their speech to text to find endpoints in very small increments and then send the text to the model immediately.

So unless the system has a lot of engineering and/or training put into the main model being able to recognize exactly when it should keep waiting versus a real response, it will just see something like "user: empty response" or "user: uhmm" and assume it is supposed to respond to that.