It once again completely fails on an extremely simple test: look at a screenshot of sheet music, and tell me what the notes are. Producing a MIDI file for it (unsurprisingly) was far beyond its capabilities.
Interpreting sheet music images is very complex, and I’m not surprised general-purpose LLMs totally fail at it. It’s orders of magnitude harder than text OCR, due to the two-dimensional-ness.
For much better results, use a custom trained model like the one at Soundslice: https://www.soundslice.com/sheet-music-scanner/