Hacker News new | ask | show | jobs
by wokwokwok 1236 days ago
No. They can’t.

You could train a model that could, but these models can’t.

Paper: https://google-research.github.io/seanet/musiclm/examples/

Quote: “By relying on pretrained and frozen MuLan, we need audio- only data for training the other components of MusicLM. We train SoundStream and w2v-BERT on the Free Music Archive (FMA) dataset (Defferrard et al., 2017), whereas the tokenizers and the autoregressive models for the seman- tic and acoustic modeling stages are trained on a dataset con- taining five million audio clips, amounting to 280k hours of music at 24 kHz.”

Tldr: you can only get out of these models what you put in, and these ones are trained on raw audio.

If you want midi output, you need to train a model on midi data.