| HN Mirror

The limitation comes from the underlying model, which can only generate up to 30s, more info about that here: https://huggingface.co/docs/transformers/en/model_doc/musicg...

There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.

The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.