|
|
|
|
|
by peatmoss
1 hour ago
|
|
I'd imagine that rendered audio that just used midi voices (even high quality "Real Instruments" midi voices) would be pretty brittle for e.g. stem separation or automatic transcription. In a best case, I think you'd start with a clean digital representation, render sheet music imagery, and then have lots of recordings by a bunch of real instrumentalists playing the same music. On the topic of stem separation, I've wondered about creating a quasi-synthetic dataset by taking chunks of recordings by real musicians playing them back in a real space in various combinations and recording the resulting analog-blended cacophony. Could repeat in various environments like cathedrals, basement bars, etc for realism :-) |
|