|
|
|
|
|
by alephnerd
126 days ago
|
|
This aligns with what I've been thinking and chatting with my peers about - technical documentation would be useful to benchmark performance globally, but I have heard murmurs of it already being used for voice-gen usecases by a WITCH company. |
|
STT time to first token is ~300ms. ~20 second audio takes less than 1 second to be converted.
TTS time to first token is ~700ms. ~20 second of audio is generated under 2 seconds.