Hacker News new | ask | show | jobs
by nwalker85 1860 days ago
Really interesting use of intents and entities. I feel like some of this is reinventing the wheel, since there is already a grammar specification, but novel use of intents/entities. https://www.w3.org/TR/speech-grammar/
2 comments

Yeah, in my experience no one uses or supports that specification, which is a shame because if you're using something like AWS Connect with AWS Lex for telephony IVR, you can't just create a grammar and then have AWS Lex figure out how to turn its recognized speech-to-text into something that matches a grammar rule. Thus, Lex will return speech-to-text results that are according to general English grammar rules, rather than what you might have prompted the user to reply with. You'll be unpleasantly surprised if you think that defining a custom entity as alphanumeric always prevents the utterance "[wĘŚn]" as sometimes matching "won" instead of "one" or "1".

Edit - Sorry, I realize that's a tangent. What I'm saying is that when I was evaluating speech to text engines for things like IVR systems using AWS and Google, neither of them supported SRGS. Microsoft does, I think, but they didn't have a telephony component, and IBM was ignored from the get go, so "no one" really means "two very large companies."

Some do, some don't, sure. Google STT for example supports class tokens natively. There are also services like uniMRCP that allow for certain SRGS grammar features to be used with Google STT, but they are limited in what constructs they support. I've worked pretty extensively with a platform called Verbio, and they fully support the SRGS grammar specification. I work in conversational AI, and when I do implementations, I have to evaluate the complexity of the use case and whether or not a full grammar will be needed and choose a STT provider based on that.
My templating language was inspired by JSGF, which seems to have informed the ABNF version of the W3C Speech Grammars. I don't support probabilities, though, since those are derived during the n-gram model generation.

I would have preferred to use a standard. Perhaps this is something for a future version.