Hacker News new | ask | show | jobs
Show HN: MusicGPT – An Open Source App for Generating Music with Local LLMs (github.com)
7 points by gabimtme 756 days ago
3 comments

Hi HN! Author here, I wanted to show off the latest side hustle that I've been cooking for the past few months.

This is a terminal application that runs the latest AI models for music generation locally, using the CPU or GPU of the device, and without the need of heavy dependencies like Python or machine learning frameworks. It works on Linux, Mac and Windows seamlessly, with a binary size of just ~30 Mb for the non-GPU versions.

The app works like this:

- It accepts a natural language prompt from the user

- Generates a music sample conditioned by the prompt

- Encodes the generated sample into .wav format and plays it on the device

Additionally, it ships a UI that allows interacting with the AI models in a chat-like web application, storing chat history and generated music on the device.

The vision of the project is that it can eventually generate infinite music streams in real time, for example, an infinite stream of always new LoFi songs for listening while coding, but not quite there yet...

Hope you like it!

I'm very glad this finally becomes available. I immediately asked it to do what I wanted for almost a decade: generate 30 minutes of Bach Cello Suite No.1 in G Prelude[1] (I want to listen to it for hours but don't want a loop neither do I want the whole original - only the prelude, subtle variations are welcome). The result was 9 seconds of ear horror. Do I need better hardware to get a listenable result (I ran it on a Ryzen laptop)?

[1] https://www.youtube.com/watch?v=mGQLXRTl3Z0

The model has some limitations, for example: - it can only generate 30 seconds of audio - there's performance differences between music genres

You can read more about the limitations here https://huggingface.co/facebook/musicgen-small

This is cool! The docker image made this easy to try out. What's the reason for the 30s limit? Would it be possible to generate bars and stitch them together?
The limitation comes from the underlying model, which can only generate up to 30s, more info about that here: https://huggingface.co/docs/transformers/en/model_doc/musicg...

There's a model version that is able to generate music conditioned not only on natural language prompts, but also on other pieces of music, so it's possible to generate chunks of 10s where each chunk is generated based on the previous one.

The challenge with that model is that it's hard to export it in ONNX format so that it can be run outside of a machine learning framework in Python.