Hacker News new | ask | show | jobs
by lapink 1410 days ago
Original Demucs author here. Thanks for putting forward our research!

I’m definitely happy to see more front ends for Demucs being developed and to read that it has been useful to other musicians!

We are working on the next iteration of the model, and with more sources, hopefully released by the end of the year :)

If you are interested in this research you can follow my Twitter (@honualx) or star the Demucs repo.

3 comments

I'm curious, what is the business justification for funding development of Demucs, if you don't mind me asking? It doesn't seem very related to FB's core business.
Solving problems like audio source separation (eg. Distinguishing multiple speakers in a noisy environment, or picking speech out of a background where music is playing) seems very much in FB's wheelhouse.
The goal of Meta AI Research is to do open research, not necessarily with direct applications at the time we start it. Indeed, the architecture, or the lessons learnt working on it can become useful later for the company, for instance for remote presence with VR, to isolate the main speaker from its environnement ( https://arxiv.org/pdf/2206.15423.pdf ).
Just a guess here, but I wouldn't be surprised if it's used to better spy on your messenger audio conversations. They already listen in and will pick up keywords to populate your FB ad stream.
That’s absolutely not true. Facebook does not listen to your conversation: https://twitter.com/jspujji/status/1474797770871615497?s=20&...
If I can reconstruct your conversation (through other meta information), without listening to sounds of your voices, have I not listened to your conversation?
Hi, I just downloaded demucs yesterday and started using it. It's amazing! I really appreciate all the work you put into making it easy to install and understand.

Is there any chance you can disentangle guitar and keyboard? I work a lot with Grateful Dead music and I'd like to be able to pull jerry's guitar out from the keyboard from live shows. Similarly, it would be cool if you could parse shpongle into its consituent tracks, but I think that's probably impossible.

Is there something similar for separating different voices from spoken audio?
Yes there are, you can have a look at https://github.com/etzinis/sudo_rm_rf for instance for 2 speakers separation. There is also this one for 3 speakers: https://huggingface.co/speechbrain/sepformer-whamr