Hacker News new | ask | show | jobs
by gsuuon 755 days ago
This is cool! The docker image made this easy to try out. What's the reason for the 30s limit? Would it be possible to generate bars and stitch them together?
1 comments

The limitation comes from the underlying model, which can only generate up to 30s, more info about that here: https://huggingface.co/docs/transformers/en/model_doc/musicg...

There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.

The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.