MusicGen is an LLM on top of EnCodec tokens, instead of working directly with audio. EnCodec is neural audio compression algorithm that encodes audio as tokens from a codebook. It's a really clever trick!
"We introduce a proper inductive bias of periodicity to the generator by applying a recently proposed
periodic activation called Snake function (Liu et al., 2020), defined as fα(x) = x +
1
α
sin2
(αx),
where α is a trainable parameter that controls the frequency of the periodic component of the signal
and larger α gives higher frequency. The use of sin2
(x) ensures monotonicity and renders it amenable
to easy optimization. Liu et al. (2020) demonstrates this periodic activation exhibits an improved
extrapolation capability for temperature and financial data prediction."
MusicGen is an LLM on top of EnCodec tokens, instead of working directly with audio. EnCodec is neural audio compression algorithm that encodes audio as tokens from a codebook. It's a really clever trick!