| This is stunning! Great stuff. Since the input and prediction is a single sequence, did you experiment with beamsearch/stochastic beamsearch decoding (maybe with additional diversity criteria)? I found that even simple models (markov chains) got a big diversity boost with a stochastic beamsearch - it might avoid the problems with low temperature repetition that could happen in a standard beamsearch. However, my music models are much, much, (much) worse than this, so my relative improvement might be related to that. Similarly, I am finding really nice results in text (RNN-VAE) with scheduled sampling, it might be worth experimenting with. I am amazed at how good this next-step sampled output is. The above ideas might just hurt the result, I am having a hard time imagining how it could be better. What soundfont/midi rendering package is used for this? The piano sound is really rich. Looking forward to hearing what creative things users will do with this model. |
There's also no consensus on whether the high- or low-temperature samples sound better. I've heard both opinions from several people.
Sageev did the final rendering, not sure what he used but I'm pretty sure it was nothing too fancy.