Hacker News new | ask | show | jobs
by peterkos 1128 days ago
These models aren't actually doing any musical comparison -- they are trained on audio, and from audio, piece apart with a "note" and a "melody" and an "instrument" are from the labelled training data. No intentional theory is being done!

Algorithmic music composition has usually been split into two:

1. Generate notes (re: theory, genre)

2. Generate sound

(i.e., EMI[0], Kulitta[1], MusicNet[2])

Now we are doing both at the same time, and backwards. The model isn't (necessarily) going "write melody, then generate the sound", but rather, "here are 500 songs that are described with X, 500 with Y, and you want XY, so we'll combine these two" :)

(This is my best understanding, so feel free to correct)

[0]: http://artsites.ucsc.edu/faculty/cope/experiments.htm

[1]: https://hackage.haskell.org/package/Kulitta

[2]: https://zenodo.org/record/5120004

2 comments

That is it, pretty much.

The problem is that coherent musical structures are much more constrained. You can't just XYZ... into a space and get something that makes sense.

That will kind of work for low-density music, which includes a lot of landfill dance + subgenres. But these statistical models are blind to larger and more complex structures, and completely unaware of cultural context and semantics.

It's actually a harder problem than language modelling because the spaces and the grammars are much larger, especially once you start including sound quality and production values as well as arrangement and core composition.

This strikes me as the wrong approach. What is the end goal here? To have an AI black box that spits out an infinite stream of music? I don’t think people are going to be excited by music that has no human in the loop, nor any connection to the physical world.

We are already drowning in music, you can turn on Spotify and have enough music to fill a lifetime. Yet new music is still being produced, why? Because music is ultimately a psychological experience, the human connection is a not-insubstantial part of the experience.

There’s a place for AI in music but it has to be white box, there needs to be scope for a human to jump in there, modify things, and make it their own. Otherwise, who will care?

For many genres of music I like, it did a terrific job. I could listen to a few hours of this Chinese instrumental music Electronic remix.

And an infinite stream of music is exactly what I want. I don't want to curate or search. I want to feel and ask and get.

We already have infinite streams of Seinfeld