Hacker News new | ask | show | jobs
by dangrossman 4471 days ago
Their homepage (https://wit.ai/) says "stream audio to the API, get structured information in return", but the API docs say "send natural language sentences (text) and get structured information (JSON) in return".

That's disappointing since the only problem I ran into with doing home automation via a web application was the speech-to-text, not processing commands once they were in text. A list of regular expressions works quite well for that.

The HTML5 Speech Recognition API in Chrome kinda sucks. It does speech to text well, but reliably keeping the API listening for speech at all has been challenging. Even a bunch of code basically checking "has the webkitSpeechRecognition object borked itself yet? recreate it and restart listening" every two seconds doesn't work reliably.

I'd love a JavaScript API that can listen to the microphone, determine if anything has been spoken (versus silence or background noise), and when something that may be speech is detected, send it to another API endpoint that converts it to text.

Edit: They do take audio input, woo :) Thanks for the correction. https://wit.ai/docs/api#toc_9

3 comments

Their documentation has a pretty clear endpoint for sending raw audio: https://wit.ai/docs/api#toc_9
Thanks, don't know how I missed the links at the top of the page. I only checked each category on the left nav of the docs page.
The docs were outdated indeed, we fixed this, thanks!
They have JS, IOS and Android sdks, they all do speech to text, then the text NPL processing.

https://wit.ai/docs