This is cool! The docker image made this easy to try out. What's the reason for the 30s limit? Would it be possible to generate bars and stitch them together?
There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.
The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.
There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.
The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.