Hacker News new | ask | show | jobs
by WhitneyLand 2 hours ago
“there aren't great corpora of training data that would connect a MusicXML representation to sheet music images or to audio”

It may not be necessary…a lot of the training pairs/data for this could probably be procedurally created via code.

Would be pretty fun to work on and see it come to life.

1 comments

I'd imagine that rendered audio that just used midi voices (even high quality "Real Instruments" midi voices) would be pretty brittle for e.g. stem separation or automatic transcription. In a best case, I think you'd start with a clean digital representation, render sheet music imagery, and then have lots of recordings by a bunch of real instrumentalists playing the same music.

On the topic of stem separation, I've wondered about creating a quasi-synthetic dataset by taking chunks of recordings by real musicians playing them back in a real space in various combinations and recording the resulting analog-blended cacophony. Could repeat in various environments like cathedrals, basement bars, etc for realism :-)