Hacker News new | ask | show | jobs
by 1-6 928 days ago
Seems like MusicXML is a great format for ML applications. You need to start somewhere and machine-readable code is important.
2 comments

MusicXML seems to be more for notation and sheet music typesetting rather than algorithmic operations on the notes themselves. Sure you could train a model on it but you'd be better off doing it on the specific domain and classically translating up to the XML format.
Right, but sheet music is ubiquitous in countless musical contexts and there's very little attention to it from the ML side. Sheet music is somewhat arduous to create and there is definitely room for a lot of automation and ML could help out a lot. I experimented with a tokenizer / GPT-2 (decoder-only) model for MusicXML (https://github.com/jsphweid/xamil) that is able to generate single staff music somewhat coherently. But it's just a first step and I don't care about generating (EDIT: hallucinated) music. Ideally we could add an encoder part to a model like this that takes in MIDI tokens and spits out sheet music. But I haven't gotten that far and don't have the ML chops to do it at this time. But it shouldn't be impossible.
Having an MP3 to sheet music would be even better, but probably 10x harder to do well.
For now, between the state of the art source separation models (e.g. demucs) and transcription models (e.g. Magenta's MT3) the last mile seems to be MIDI -> MusicXML IMO. But yes, I suspect it'll become more end-to-end ML in time.
If you want to do ML on notation, then maybe. MIDI or PCM audio might be a better place to start if you want to work directly on the music.
Note that MIDI is a lot more effective when it comes to ML/AI, since it's multiple orders of magnitude less data. Daniel D. Johnson's (formerly known as Hexahedria, hired by Google Brain) model biaxial-rnn-music-composition is from 2015, requires very few resources for training or inference, and still delivers compelling, SOTA-or-close results wrt. improvising ("noodling") classical piano. https://github.com/danieldjohnson/biaxial-rnn-music-composit... You may also want to check out user kpister's recent port to Python 3.x and aesara: https://github.com/kpister/biaxial-rnn-music-composition (Hat tip: https://news.ycombinator.com/item?id=30328593 ).

Music generation from notation is pretty much the MINST toy-scale equivalent for sequence/language learning models, it's surprising that there's so little attention being paid to it despite how easy it to get started with.

MIDI is absolutely horrible for ML. It lacks very necessary information such as articulation etc which are important to make sense of music. It's popular because it's simple but there is no way to understand music by just looking at MIDI.

I'm a hobbyist in this space (am a composer myself as well a software engineer) and currently all tools are very poor. MusicXML is better than MIDI. MEI [1] is better than MusicXML etc.

The problem is there is miniscule amount of effort and money spent into this field because music overall makes peanuts. It really doesn't justify training expensive ML algorithms which is unfortunate.

[1] https://music-encoding.org/about/

> MIDI is absolutely horrible for ML. It lacks very necessary information such as articulation etc which are important to make sense of music.

This depends enormously on the instrument. Consider someone playing a piece live on a keyboard: we can keep a MIDI recording of that and we've captured everything about their performance that the audience hears.

> MIDI is absolutely horrible for ML.

It depends what you're trying to do. If you're trying to generate sheet music that's pretty to look at and easily understandable to a performer, then yes obviously it's not enough. If you want notes that will actually sound good when played back, it's hard to beat it.

> If you want notes that will actually sound good when played back, it's hard to beat it.

I strongly disagree with this. There is no good algorithmic music generator trained on MIDI. They all generate elevator music.

Are you aware of the system I linked above? D.D. Johnson has a blogpost https://www.danieldjohnson.com/2015/08/03/composing-music-wi... with plenty of examples of what an instance of his model can generate. It may not be all that "good" in an absolute sense, but it's at least musically interesting, the opposite of elevator music. (There's also a proprietary model/AI called AIVA about which very little is known, but it does seem to be bona-fide AI output - albeit released in versions that have been orchestrated by humans - based on what it sounds like.)