|
|
|
|
|
by nextaccountic
158 days ago
|
|
> Non-verbal cues are invisible to text: Transcription-based models discard sighs, throat-clearing, hesitation sounds, and other non-verbal vocalizations that carry critical conversational-flow information. Sparrow-1 hears what ASR ignores. Could Sparrow instead be used to produce high quality transcription that incorporate non-verbal cues? Or even, use Sparrow AND another existing transcription/ASR thing to augment the transcription with non-verbal cues |
|