Hacker News new | ask | show | jobs
by dharma1 3489 days ago
I'm an (ex-)musician playing with machine learning, this is very interesting, will check it out! Kudos for curating the dataset. So your goal initially is to basically build polyphonic transcription with CNN's?

I am starting to record my own dataset for solo jazz piano - all midi though. Monophonic melodies, and matching chord voicings and voice leading from one chord to the next. With the goal of learning to generate a good sounding jazz piano arrangement to a given melody with nothing except monophonic input.

Style transfer is good at essentially texture transfer - I suspect it won't work that well for understanding music theory (or text), especially with long time series dependencies, but will be very curious to see what emerges.

I'd like to hear more generative music samples from DeepMind's WaveNet too, the piano samples they published sounded very good, but it was unclear what the model had learned or generalised - and how much was semi-randomised recall. I haven't seen the open source implementations of WaveNet produce as good results yet - probably because it's computationally very expensive to train and run, and that limits experimentation. I saw AƤron give a talk on it a couple of weeks ago which helped me understand the stacked dilated convolutions - but would still like to hear more music examples :)

1 comments

Yes, we're starting with the transcription task. CNNs for local prediction are interesting, and we're also curious about capturing the temporal structure of music with something recurrent. It seems like a time series model that understands something about western music should help with music transcription just like language models help with speech transcription.

The style transfer stuff comes later and as you observe, we'll probably need some new ideas to make that work well. I haven't thought about this deeply yet, but my intuition is that maybe instrumental timbre is an audio analog of visual texture, so maybe a reasonably direct "port" of style-transfer to the audio domain would let us construct demos that, for example, rewrite a cello recording to sound like trombone.

Let us know when your dataset is complete! I love jazz.