|
|
|
|
|
by dandiep
459 days ago
|
|
1) Previous TTS models had problems with major problems accents. E.g. a Spanish sentence could drift from a Spain accent to Mexican to American all within one sentence. Has this been improved and/or is it still a WIP? 2) What is the latency? 3) Your STT API/Whisper had MAJOR problems with hallucinating things the user didn't say. Is this fixed? 4) Whisper and your audio models often auto corrected speech, e.g. if someone made a grammatical error. Or if someone is speaking Spanish and inserted an English word, it would change the word to the Spanish equivalent. Does this still happen? |
|
2/ We're doing everything we can to make it fast. Very critical that it can stream audio meaningfully faster than realtime
3+4/ I wouldn't call hallucinations "solved", but it's been the central focus for these models. So I hope you find it much improved