Hacker News new | ask | show | jobs
by czr 2415 days ago
they made an extended abstract for ismir: http://archives.ismir.net/ismir2019/latebreaking/000036.pdf

methodology is a separate u-net per instrument type to predict a soft mask in spectrogram space (time x frequency), then they apply that mask to the input audio. fairly standard.