To put it in perspective, my team in IBM Watson has already published better numbers (10.4% WER vs 13.1% WER for Baidu) on the SWB dataset. We haven't run our model on the CH part so we can't compare on the full test set. Paper here: http://www.mirlab.org/conference_papers/International_Confer....
Hi Jerome, those are great results! We got an email this morning from someone else on the Watson team pointing out that we didn't include the latest IBM number -- we'll be sure to update the results in the next version of the paper (three cheers for arXiv).
Of course, we openly say in the paper that we don't have the best result on easy subset of Hub5'00 (we had it as 11.5%). We're more interested in advancing the state of the art on challenging, noisy, varied speech. Of course we'll be working to push the SWB number down too :)
The team is already working on seeing what we get with CH. We'll let you know where we land. But your results are definitely impressive. We love to see new published innovation in the field. Kudos to the team!
Of course, we openly say in the paper that we don't have the best result on easy subset of Hub5'00 (we had it as 11.5%). We're more interested in advancing the state of the art on challenging, noisy, varied speech. Of course we'll be working to push the SWB number down too :)