There is software like Spleeter [1] that can split songs into stems (seperate drums, vocals, rhythm, etc.) using ML. I imagine it would be possible to adapt it for this. Might even work as-is if you only needed to isolate people's voices.
Adding audio is much easier than subtracting audio from a recording when said audio is distorted by imperfect speakers and recorded by imperfect microphones and has who knows what other background noises mixed in.
Said denoiser could easily destroy the audio your looking to preserve, and the altered audio is not good quality evidence of the events that occurred.
I actually just mapped it out in my business ideas note folder and yeah. I don’t see how you can build a decent tool for this job without having copies of the actual waveforms requiring some form of commercial license, however it would not need to be a performance license which seems to be the thing the publishers are most familiar licensing. So I’m not sure if this would make the process easier or harder but I definitely don’t see how they can argue the rights to use but not play the music are more valuable so they should in theory be cheaper than the sort of price iTunes and Spotify are paying per track.
It should be part of the YouTube post-upload processing. You find another YouTube link of the offending music, it syncs the two audio tracks, and removes from your upload.
1. https://github.com/deezer/spleeter