|
|
|
|
|
by pbw
3773 days ago
|
|
It took me a while to understand what you did here. I was waiting for some kind of subtitles showing the recognition ability. But you are saying you performed speech recognition on the full video then edited it according to where the words you targeted were found. I liked the bomb/terrorist one, the others didn't seem to be "saying" anything. |
|
The important takeaway is that the Watson API parses a stream of spoken audio (other services, such as Microsoft's Oxford, works only on 10-second chunks, i.e. optimized for user commands) and tokenizes it...what you get is a timestamp for when each recognized word appears, as well as a confidence level and alternatives if you so specify. Other speech-transcription options don't always provide this...I don't think PocketSphinx does, for example. Or sending your audio to a mTurk based transcription service.
Here's a little more detail about The Wire transcription, along with the JSON that Watson returns, and a simplified CSV version of it:
https://github.com/dannguyen/watson-word-watcher/tree/master...