Hacker News new | ask | show | jobs
by dimatura 476 days ago
I definitely see this happening. Music generation has lagged behind image generation but is following more or less the same path. Early image generation models were completely unconditional; all you could do was sample an image. Then coarse conditioning methods such as text prompts and depth images came along; then additional tooling to tune images in a more fine-grained way.

That said, there is a difference to images in that music also has a "symbolic" level to it that is closer to text than images [1]. There's other work out there that uses LLM-type tools for direct melody generation (no audio). And of course, there's lyrics. I do expect commercial tools to start integrating all these capabilities gradually, it's just a matter of time.

[1] I guess there's also vector images (like SVG) - I've seen work in generating those as well, though it's less mature than directly generating pixels.