Hacker News new | ask | show | jobs
by KRAKRISMOTT 1098 days ago
How is this possible? I thought deep learning models struggle with approximating periodic functions like sin.
2 comments

Here is the MusicGen paper from Facebook research: https://arxiv.org/abs/2306.05284

MusicGen is an LLM on top of EnCodec tokens, instead of working directly with audio. EnCodec is neural audio compression algorithm that encodes audio as tokens from a codebook. It's a really clever trick!

The samples are outstanding. Even if they are cherry picked (not saying they are but even if) the output seems incredible.

https://ai.honu.io/papers/musicgen/

bigvgan paper (https://arxiv.org/pdf/2206.04658.pdf)

"We introduce a proper inductive bias of periodicity to the generator by applying a recently proposed periodic activation called Snake function (Liu et al., 2020), defined as fα(x) = x + 1 α sin2 (αx), where α is a trainable parameter that controls the frequency of the periodic component of the signal and larger α gives higher frequency. The use of sin2 (x) ensures monotonicity and renders it amenable to easy optimization. Liu et al. (2020) demonstrates this periodic activation exhibits an improved extrapolation capability for temperature and financial data prediction."