Hacker News new | ask | show | jobs
by posaune 824 days ago
Very cool idea - I’ve been curious about text-sound approaches that allow for more granularity rather than just producing fully-formed tracks. Would be curious to know more details about the pitfalls of using Encodec with shorter samples. As a musician and software engineer, I could see lots of applications of this sample-first approach that don’t rely on the familiar DAW timeline/track UI, too.
1 comments

Thank you so much! The biggest issue with Encodec (especially the 48kHz version) is that it is very dependent on normalization. This wasn't an issue for their use case (music) since music generally doesn't contain silent portions, but not so for samples. Many oneshots and loops have a great deal of silence or very quiet portions of the waveform, which when normalized become essentially pure noise. Training our custom autoencoder to handle this issue was one of the key factors which enabled us to get such good audio quality.