As a speech technologist, I am amazed and proud about how far long the technology has progressed, especially over the last few years. Even my wife now uses speech input on mobile devices (and may finally think I may be doing something productive...). With that said, speech input is still a surprisingly finicky technology and different people will see different beahviors across systems from different providers.
I can only imagine how finicky it is. But it is truly amazing tech, and quite revolutionary. I probably do about 75%+ of my searches via voice, and it would more likely be 90%+ if I wasn't embarrassed about talking to my phone in public and broadcasting my searches to anyone in earshot :P
Siri was completely unusable/unresponsive from 2011/2012, but then, somewhere around 2012/2013, started to become pretty good (most of the time) for things like, "Wake me up at 6:30 AM" - I used it for that type of query a lot. Dictation, though, was spotty - I would say about 10-20% of the time, I just got a spinning non-response, and even when it did work, it would be slow, and the results would be iffy. And, once again, I used the dictation a lot.
But - sometime in 2014, and I can't really place it - but right around June/August, Siri all of a sudden turned a corner, and her dictation ability got markedly better - so much now, that I don't even bother typing into my iPhone if I'm in a place where I can talk to it - dictation is 99% flawless. much better than my typing, and unquestionably faster.
For whatever reason, Apple hasn't been making a big deal of this - perhaps because they don't want to admit how crappy it was before - but it really is a big deal. Siri is, 3 years later, what she should have been in 20111.
Can't wait to see what the next step in this evolution will be...
My understanding is that it is acoustic modeling that was drastically improved using deep learning. That is, while speech recognition improved, acoustic modeling improved more. So, strictly speaking, technology is now better at ignoring noise, rather than better at understanding speech. Of course, to users, there is no difference.