|
|
|
|
|
by wokwokwok
1236 days ago
|
|
No. They can’t. You could train a model that could, but these models can’t. Paper: https://google-research.github.io/seanet/musiclm/examples/ Quote: “By relying on pretrained and frozen MuLan, we need audio- only data for training the other components of MusicLM. We train SoundStream and w2v-BERT on the Free Music Archive (FMA) dataset (Defferrard et al., 2017), whereas the tokenizers and the autoregressive models for the seman- tic and acoustic modeling stages are trained on a dataset con- taining five million audio clips, amounting to 280k hours of music at 24 kHz.” Tldr: you can only get out of these models what you put in, and these ones are trained on raw audio. If you want midi output, you need to train a model on midi data. |
|