Hacker News new | ask | show | jobs
by birdsbolt 4067 days ago
It depends on how exactly global they are, a Markov chain with a large enough degree could be good enough. There's nothing stopping anyone in using additional global features when they're conditioning in the chain. But, that would require quite a bit of data to construct a good distribution. Nonetheless, Markov chains can be pretty smart.

Or, if you want to get even better performance, you can use conditional random fields with exactly the same global features, one has an advantage of not needing so much data because the distribution being modelled isn't a joint one + CRFs are excellent with custom features (features are observed variables and their distribution is implicitly present in the conditional distribution). Disadvantage would be that you couldn't as easily generate the sequence of notes (chords) because the model isn't generative (unlike Markov chains), but one can use Gibbs sampling (combined with CRFs) to search over the space of probable sequences.

Or, some nicely trained convnets could get you even closer to the brain of the composer :D

1 comments

No, Markov chains at least cannot work because they are fundamentally finite state machines with no global state.

Say you want to generate 'verse chorus verse slightly-different-chorus', which is an idea that I've seen in basically every type of music that I've listened to. If you want to generate a slightly different version of the first chorus, you need knowledge of the first chorus, which is not possible with a Markov chain unless the state that represents the start of the second chorus is only possible to reach given that the first chorus was generated; i.e. you need to code in every second chorus possible into your Markov chain, i.e. you need to code in every first chorus possible into your Markov chain, i.e. the human's composed the piece.

The thing with computer generated music is that music is complicated; it's fundamentally not just a set of rules that you can apply and get good music. Yes, counterpoint does have many rules and suggestions that can restrict you, but they don't specify all good music.

In the same way that if you start combining logical axioms and inference rules, you generally just get random useless theorems, combining musical rules in an unstructured way is pretty much guaranteed to get you useless sequences of locally-alright notes.

The correct way of using the rules is (with logic) to start at the conjecture you want to prove and use the computer to prove the theorem correct by working backwards. With counterpoint, it's to compose the music, click the 'check for mistakes' button in Sibelius and check that you haven't made any glaring errors.

Of course one needs the knowledge of the first chorus, and that is possible to do with a sufficiently large degree of Markov chain, or you could add all of the notes before the chorus as features during the transitions.

If you use CRFs you can condition on the whole piece and learn the model like that. Yes, you'll have to use a lot of data but models can be as global as you need them to be.

If you want to use a 'verse chorus verse slightly-different-chorus' way of composing, yes, you can use a first level of a chain to generate the probable musical sequence blocks, and generate each block separately, using at the same time features generated in each part (verse, chorus, slightly-different-chorus etc.) to keep the same feeling.

If you train your model in a way described above, you can then pick a tune in your head, put it in and ask the model to generate the most probable sequence for the whole song. Or, if you're using CRFs with Gibbs sampling, you start from the complete piece and iterate until the probability it fits is large enough. Same could be done, somewhat easily, with Hidden Markov Models (I just realised that Markov models might not be the thing I was referring to in the post above, I was talking about statistical variant of Markov chain).

Convolutional neural networks could do the same thing, probably even better than CRFs and HMMs. Music isn't more complex than language and people have been using these sequence modelling methods to do extraordinary things in natural language processing.

I honestly don't think (from your comments) that you know enough about music to make pronouncements of the sort that you are making. I'm sure that people are doing amazing stuff in natural language processing, but I'm also sure that you're underestimating the complexity of music.

Producing a program that can output quality music on demand would be largely comparable to producing a program that can output quality novels on demand. I'd be entirely unsurprised if it turned out to be an AI-complete problem; some evidence for this being that most humans with training are found to be incapable of composing quality music (where almost anyone can perform most of the tasks that have been solved by NLP researchers).

Not really, models in NLP go beyond human performance in some tasks (not tasks as trivial as part-of-speech tagging).

I have a ten year formal training in music - piano (never went to college), I assumed we aren't really talking about composing Rachmaninoff-like pieces. You seem to be aiming at genius-level compositions, that is, currently, unrealistic, and I was surely not talking about that.

You're also going into philosophy of quality. What is quality? Are you doubting the ability of the model trained on thousands of classical compositions to reproduce a fully structured classical piece that sounds well and has a few leitmotifs? It's very easy to constrain the model with a leitmotif positioned at several places and ask of it to find you the most probable sequence (to fill the blanks). It's very easy to take a composition, decompose it into its constituent parts (chorus, verse, etc.) train this kind of sequence to a sequence model, and then do the same for the higher level stuff.

I mean, I agree with you that rule based systems wouldn't work. But statistical models could, if used in music with as much fervor as they are used in tasks in NLP, absolutely produce regular compositions that don't sound like you're randomly spitting out the notes.

Or are you aiming at profound genius compositions? Or maybe super-pop songs? Then I agree, that would be an AI-complete problem, equivalent to machine translation and 300 page novel production.

> Are you doubting the ability of the model trained on thousands of classical compositions to reproduce a fully structured classical piece that sounds well and has a few leitmotifs?

Yes, I am. It turns out to be immensely difficult to do basic compositional tricks, like writing acceptably good classical counterpoint, or harmonising simple chorales.

Writing a full scale piece is a whole other level of difficulty. Writing a full scale piece that's going to be played over and over is a level or two beyond that.

David Cope's EMI is probably the state of the art:

http://artsites.ucsc.edu/faculty/cope/mp3page.htm

Listen to the Bach and Chopin. If you know anything about music you can hear that they sound like what they are: randomised cut and paste mash-ups of elaboration techniques and motifs that lack the musical narrative logic that the original composers were so good at.

Basically they're competent but mediocre pastiche, glued together out of little bits and pieces, lacking any overall form or drive.

Now - you're supposed to learn this stuff at composition school, and getting a computer to do it to this level is certainly an achievement.

But it's still some way short of being interesting and memorable music.

I don't think pop is any easier. E.g. trance and progressive house sound totally formulaic - until you try to copy them, and realise that getting something good is harder than it sounds.

So no - it's in no way a trivial problem. And a naive Markov approach is in no way a good enough answer.