Hacker News new | ask | show | jobs
by kastnerkyle 3278 days ago
This is stunning! Great stuff.

Since the input and prediction is a single sequence, did you experiment with beamsearch/stochastic beamsearch decoding (maybe with additional diversity criteria)?

I found that even simple models (markov chains) got a big diversity boost with a stochastic beamsearch - it might avoid the problems with low temperature repetition that could happen in a standard beamsearch. However, my music models are much, much, (much) worse than this, so my relative improvement might be related to that.

Similarly, I am finding really nice results in text (RNN-VAE) with scheduled sampling, it might be worth experimenting with.

I am amazed at how good this next-step sampled output is. The above ideas might just hurt the result, I am having a hard time imagining how it could be better.

What soundfont/midi rendering package is used for this? The piano sound is really rich.

Looking forward to hearing what creative things users will do with this model.

2 comments

Hey Kyle, we didn't try anything more advanced than next-step sampling. You probably have a better sense than I do how much improvement such techniques are likely to yield. My unfounded suspicion is that we're close to the limit of generation quality from this dataset, and so I'm most interested in trying to gather 10-100x more skilled performances, one way or another.

There's also no consensus on whether the high- or low-temperature samples sound better. I've heard both opinions from several people.

Sageev did the final rendering, not sure what he used but I'm pretty sure it was nothing too fancy.

A bigger dataset of MIDI with velocity information and performance timing would be really, really great.

High temperature versus low is tough to compare - I find that sometimes low temperature seems better, then I change the random seed and my opinion flips.

Same for stochastic versus deterministic beam search, length/diversity scoring, and so on. I have been meaning to blog on this, will send it your way when I get it posted.

For character text, stochastic seems nicer broadly (maybe due to limited size of markov space, see [0] deterministic vs. [1] stochastic) but for music it depends on the representation I use. However at least in this cherrypicked example, I find the repetition of the deterministic beamsearch hilarious even though it is "worse".

Interesting, I will have to ask him what it was. With that render, at least my bad samples will sound prettier.

Great job on the model again!

[0] https://badsamples.tumblr.com/post/160767248407/a-markov-arg...

[1] https://badsamples.tumblr.com/post/160777871547/stochastic-s...

I think the choice of piano really sells the quality of the result. Musically it's not that great since it still sounds like random noodling. Much better than any other implementations that I've heard.