Hacker News new | ask | show | jobs
by nojs 789 days ago
This matches my experience doing it with Elixir/OpenAI/ElevenLabs as well.

Depending on the application it’s also possible to fire the whole thing off pre-emptively, and then use the early response unless later context explicitly invalidates it.

Another cool trick to get around TTS latency is to maintain an audio cache keyed by semantic meaning, and get the LLM to choose from the cache. This saves high TTS API costs too.

1 comments

appointment scheduling seems like an ideal consumer of cached audio responses, but how can segments be concatenated into a naturally sounded response?