|
|
|
|
|
by magicmu
3798 days ago
|
|
I was thinking the same thing, it almost looks a little too good to be true -- although it does kinda make sense given the focus on GPU-based clusters. I wonder how this compares to Baidu's warp-ctc [1]. They don't really seem to be the same thing, and maybe I'm missing something since I'm just starting to get into ML, but it seems to be conspicuously absent from this writeup. [1] https://github.com/baidu-research/warp-ctc |
|
If so, while very cool, that's not a general solution. Scaling batch sizes of 256 or lower would be the breakthrough. I suspect they get away with this because speech recognition has very sparse output targets (words/phonemes).
Too bad the code below isn't open-source because they got g2 instances with ~2.5 Gb/s interconnect to scale:
http://www.nikkostrom.com/publications/interspeech2015/strom...