|
|
|
|
|
by shivekkhurana
126 days ago
|
|
The TTS/STT models are actually good and aggressively priced. I personally built a voice-mode ai assistant. STT time to first token is ~300ms. ~20 second audio takes less than 1 second to be converted. TTS time to first token is ~700ms. ~20 second of audio is generated under 2 seconds. |
|
I feel this is also why you don't see the same degree of hype as you would with the other players. When you are taking an application-driven approach to launching AI products, hype matters less than targeting decisionmakers and showing that your product directly aligns with their outcomes.