Unless things have changed a lot in the past few months, they actually are pretty limited locally at this point. Certainly you can do them, but you can't them particularly well.
Mostly from playing a bit with Sphinx and loosely following the space. But as you say things are always changing.
I will note that different people have different definitions of "good" in this space. For example, I've used online services like otter.ai and, while they're useful for certain purposes, they're still much worse than a human for transcribing. They're certainly not worth my time to clean them up to publish an actual transcript.
but for something like Alexa which is designed to recognize specific commands it would be easy to design something with relatively good accuracy
I have not tried the other [otter.ai] service but even with speech to text on my iPhone, which to note is not running luckily [locally], it seems pretty accurate at transcribing what i say
^ the above was transcribed on an iPhone and the errors are bracketed.
wav2letter can transcribe locally and well.