It doesn't seem to like slow music. I jokingly gave it a prompt to create a wedding ceremony processional music for the entrance of the bride at ~60bpm (I'm getting married in 2.5 weeks).
EDIT: Exact prompt "wedding procession bride entrance"
File "predict.py", line 211, in predict
raise ValueError(
ValueError: Failed to generate a loop in the requested 60.23 bpm. Please try again.
EDIT: At 52 bpm (exact) it seems to work. What it generated would not sound good if looped however. In terms of style.. it sounded a little music box like - celesta or so (think of the beginning of the Harry Potter soundtrack) with some sustained strings and pizzicato strings. That would be appropriate, except the rhythm and chords are fairly random and I wouldn't exactly call this musical :)
Many thanks for this joyous thing. I tried a few different prompts and always got something weird and interesting, but never quite what I wanted:
Firstly, with temperature set to 2:
“Amen break with a bag of spanners” (140 bpm): If the amen break is in there, I can’t tell. There does seem to be a kind of harp/bell thing doing the melody, though.
“John Bonham with kettle drums” (90 bpm): Lots of guitar, subdued drums, but could definitely be late-period Zeppelin. Variation 2 is the exception: Zep at the start and end, long pause in the middle so John can drag his sticks along a LEGO oil tanker.
“John Bonham with kettle drums and angry cat” (90 bpm): We are now inside the oil tanker.
Now setting the temperature to 1:
“Hardfloor in Luton Primark” (90 bpm): The bpm setting was an accidental leftover from the previous experiment, and the result sounds much more Primark than Hardfloor.
“Portishead at cheezy funfair” (110 bpm): It’s a very folk-y funfair. Accordions? Organs? What the hell?
MusicGen is an LLM on top of EnCodec tokens, instead of working directly with audio. EnCodec is neural audio compression algorithm that encodes audio as tokens from a codebook. It's a really clever trick!
"We introduce a proper inductive bias of periodicity to the generator by applying a recently proposed
periodic activation called Snake function (Liu et al., 2020), defined as fα(x) = x +
1
α
sin2
(αx),
where α is a trainable parameter that controls the frequency of the periodic component of the signal
and larger α gives higher frequency. The use of sin2
(x) ensures monotonicity and renders it amenable
to easy optimization. Liu et al. (2020) demonstrates this periodic activation exhibits an improved
extrapolation capability for temperature and financial data prediction."
Prompting with "harsh noise wall" resulted in some cool industrial breakbeats instead of raw noise. Looks like AI will not be taking Merzbow's job any time soon
I could see where it would be much more simple for an "AI" to generate a techno sound compared to something more melodic. I tried getting it to make things, and just had no luck with anything that sounded close to what was being requested. Some of it had a beat, but I don't think I could dance to it (can't tell since it's not actually loopable), so I gave it a 2
I got some unexpected melodic mellow Indian-sounding sitar-music with the input prompt "suomisaundi". Not at all what I expected, but quite nice nevertheless!
It worked better with "suomisaundi psychedelic trance spugedelic".
File "predict.py", line 211, in predict raise ValueError( ValueError: Failed to generate a loop in the requested 60.23 bpm. Please try again.
EDIT: At 52 bpm (exact) it seems to work. What it generated would not sound good if looped however. In terms of style.. it sounded a little music box like - celesta or so (think of the beginning of the Harry Potter soundtrack) with some sustained strings and pizzicato strings. That would be appropriate, except the rhythm and chords are fairly random and I wouldn't exactly call this musical :)