Hacker News new | ask | show | jobs
by jthickstun 3489 days ago
Hi savanaly,

When we talk about "what makes Bach sound like Bach," the technical concept we have in mind is the recent work in computer vision on style transfer. For example,

https://arxiv.org/abs/1508.06576

We are excited to work on adapting these models to the musical domain!

As for note prediction, you can see our results in our paper:

https://arxiv.org/abs/1611.09827

Our are results for simple (2-layer, not very "deep" models); we were interested in understanding the low-level "features" of music rather than building a model that maximizes performance. Nevertheless, the results are quite promising; I'm confident that someone using our dataset with a deep network and a lot of gpus could blow our numbers out of the water! :)

Tutorials on how to set up and evaluate this task are available on our website:

http://homes.cs.washington.edu/~thickstn/start.html

1 comments

I'm an (ex-)musician playing with machine learning, this is very interesting, will check it out! Kudos for curating the dataset. So your goal initially is to basically build polyphonic transcription with CNN's?

I am starting to record my own dataset for solo jazz piano - all midi though. Monophonic melodies, and matching chord voicings and voice leading from one chord to the next. With the goal of learning to generate a good sounding jazz piano arrangement to a given melody with nothing except monophonic input.

Style transfer is good at essentially texture transfer - I suspect it won't work that well for understanding music theory (or text), especially with long time series dependencies, but will be very curious to see what emerges.

I'd like to hear more generative music samples from DeepMind's WaveNet too, the piano samples they published sounded very good, but it was unclear what the model had learned or generalised - and how much was semi-randomised recall. I haven't seen the open source implementations of WaveNet produce as good results yet - probably because it's computationally very expensive to train and run, and that limits experimentation. I saw AƤron give a talk on it a couple of weeks ago which helped me understand the stacked dilated convolutions - but would still like to hear more music examples :)

Yes, we're starting with the transcription task. CNNs for local prediction are interesting, and we're also curious about capturing the temporal structure of music with something recurrent. It seems like a time series model that understands something about western music should help with music transcription just like language models help with speech transcription.

The style transfer stuff comes later and as you observe, we'll probably need some new ideas to make that work well. I haven't thought about this deeply yet, but my intuition is that maybe instrumental timbre is an audio analog of visual texture, so maybe a reasonably direct "port" of style-transfer to the audio domain would let us construct demos that, for example, rewrite a cello recording to sound like trombone.

Let us know when your dataset is complete! I love jazz.