Hacker News new | ask | show | jobs
by bravura 585 days ago
I know the reported scores of MT3 are very good, but have you had success with using it yourself?

https://replicate.com/turian/multi-task-music-transcription

I ported their colab to runtime so I could use it more easily.

The MIDI output is... puzzling?

I've tried feeding it even simple stems and found the output unusable for some tracks, i.e. the MIDI output and audio were not well aligned and there were timing issues. On other audio it seemed to work fine.

2 comments

Multi-track transcription has a long way to go before it seriously useful for real-world applications. Ultimately I think that converting audio into MIDI makes a lot more sense for piano/guitar transcription than it does for complex multi-instrument works with sound effects ect...

Luckily for me, audio-to-seq approaches do work very well for piano, which turns out to be an amazing way of getting expressive MIDI data for training generative models.

I developed https://pyaar.ai, it uses MT3 under the hood. I realized that continuous string instruments (guitar) that have things like slides, bends are quite difficult to capture in MIDI. Piano works much better because it's more discrete (the keys abstract away the strings) and so the MIDI file has better representation
> I realized that continuous string instruments (guitar) that have things like slides, bends are quite difficult to capture in MIDI.

It's just pitch bend?

I think trying to transcribe as MIDI is just a fundamentally flawed approach that has too many (well known) pitfalls to be useful.

A trained human can listen to a piece and transcribe it in seconds, but programming it as MIDI could take minutes/hours. If you're not trying to replicate how humans learn by ear, you're probably approaching this wrong.