|
|
|
|
|
by dimatura
476 days ago
|
|
I definitely see this happening. Music generation has lagged behind image generation but is following more or less the same path. Early image generation models were completely unconditional; all you could do was sample an image. Then coarse conditioning methods such as text prompts and depth images came along; then additional tooling to tune images in a more fine-grained way. That said, there is a difference to images in that music also has a "symbolic" level to it that is closer to text than images [1]. There's other work out there that uses LLM-type tools for direct melody generation (no audio). And of course, there's lyrics. I do expect commercial tools to start integrating all these capabilities gradually, it's just a matter of time. [1] I guess there's also vector images (like SVG) - I've seen work in generating those as well, though it's less mature than directly generating pixels. |
|