Hacker News new | ask | show | jobs
by minihat 433 days ago
How is it possible that text-to-score/notation is lagging text-to-audio in music generation? Generating audio seems wildly more complicated!

Since you are working in this space, I wonder if you could comment on my pet theories for why this is true: 1. Not enough training data (scores not available for most songs), or 2. Difficulty with tokenization of musical notation vs. audio

2 comments

Mostly 1 I think. There are a few open source efforts doing what you mentioned https://github.com/EleutherAI/aria
3. Smaller market so fewer people trying to solve it