|
|
|
|
|
by CornCobs
1740 days ago
|
|
I'm working on a similar domain, music transcription. The challenge is to estimate note values (how many beats is a note supposed to be as represented in the score?) and I'm not sure what would be the a good way to measure transcription accuracy. The naive note error rate cannot capture whether my model successfully detects certain musical structure, syncopation, dotted rhythms etc |
|
I'm wondering what the higher convolution levels could look like, if this was a CNN analyzing an image. Something between a the complete Ableton/Logic export and a MIDI file. Being able to capture the "feel" of a song (or a section within a song) strikes me as an important milestone towards designing really good generative music.