| HN Mirror

Thank you!

For fully end-to-end models, it's hard to say exactly. The Char2Wav paper demonstrates that there is hypothetically an architecture and a set of weights that can do synthesis end-to-end, but we cannot yet train such a system. On Reddit, one of the Char2Wav authors comments that they tried training it directly and didn't get great results, and at SVAIL we've also had some trouble doing so. I think it is very likely going to happen in the next several months or year, but we don't yet know exactly what needs to happen in order to get it to work.

As for inference, some of the inference optimizations do generalize. In fact, the GPU optimizations (persistent kernels) were originally developed by our systems team, and published in the Persistent RNN [0] paper. (This is a really powerful technique that CUDA makes very hard to implement, and I have a massive amount of respect for the folks who managed to make it work!) Persistent RNNs make training at close-to-peak-FLOPs with very low batch sizes plausible, and make GPU WaveNet inference plausible. At the moment, our CPU kernels are much more promising, but we don't know whether that will stay the case. For mobile, I think it is possible to get the current systems to work on fairly powerful mobile CPUs with a bunch more work into optimization and low-level assembly, but we haven't done it yet so time will tell.

[0] https://svail.github.io/persistent_rnns/ and http://jmlr.org/proceedings/papers/v48/diamos16.pdf