| Shame that it uses quadratically scaling transformers - there are many sub-quadratic transformers that work quite well or better (https://github.com/lucidrains?tab=repositories) - because that 4 second sub-sample limitation seems quite unlike how I imagine most people experience music. Interesting, though. I wonder if I could take a stab at this.. Also interesting that the absolute timing of onsets worked better than relative timing - that also seems kinda bizarre to me, since, when I listen to music, it is never in absolute terms (e.g. "wow I just loved how this connects to the start of the 12th bar" vs "wow I loved that transition from what was playing 2 bars ago". Another thing on relative timing.. when I listen to music, for me, very nuanced, gradual, and intentional deviations of tempo have significant sentimenal effects - which suggests to me that you need a 'covariant' description of how the tempo needs to change over time, so, not only do you need relative timing of events, you also need relative timing of the relative timing of events as well Some examples: - Jonny Greenwood's Phantom Thread II from the Phantom Thread soundtrack [0] - the breakdown in Holy Other's amazing "Touch" [1], where the song basically grinds to a halt before releasing all the pent up emotional potential energy. [0] https://www.youtube.com/watch?v=ztFmXwJDkBY, especially just before the violin starts at 1:04 [1] https://www.youtube.com/watch?v=OwyXSmTk9as, around 2:20 |
Rubato is everywhere in classical music, and understanding rubato is an essential part of any automatic transcription system that aims to show you notes in musically meaningful units of time.