Hacker News new | ask | show | jobs
by stillpointlab 311 days ago
I really hope we move on from these boil-the-ocean models. I want something more collaborative and even iterative.

I was having a conversation with a former bandmate. He was talking about a bunch of songs he is working on. He can play guitar, a bit of bass and can sing. That leaves drums. He wants a model where he can upload a demo and it either returns a stem for a drum track or just combines his demo with some drums.

Right now these models are more like slot machines than tools. If you have the money and the time/patience, perhaps you can do something with it. But I am looking forward to when we start getting collaborative, interactive and iterative models.

6 comments

Very well said. I'm in the same boat. I'd love AI to write down a drum groove or a drum fill based on my guitar riff.

Currently, all these AI tools generate the whole song which I'm not at all interested in given songwriting is so much fun

RIP session musicians if that ever comes to pass, which is one of the main ways to make money if you are a good drummer.
Most VST drum sequencers have pretty powerful groove libraries nowadays. It's not a model or anything like that but just mix-matching and modifying the patterns some give extremely good results
do you have a point of view of this type of collaborative approach applied to other areas, for example, collective understanding for groups of people? We are working on something in that space.
The amount I have to say on this topic would be inappropriate for a Hacker News comment. But some brief and unstructured thoughts I can offer.

For collaboration I believe that _lineage_ is important. Not just a one-shot output artifact but a series of outputs connected in some kind of connected graph. It is the difference between a single intervention/change vs. a _process_. This provides a record which can act as an audit trail. In this "lineage" as I would call it, there are conversations with LLMs (prompts + context) and there are outputs.

Let's imagine the original topic, audio, with the understanding that the abstract idea could apply to anything (including mental health). I have a conversation with an LLM about some melodic ideas and the output is a score. I take the score and add it as context to a new conversation with an LLM and the output is a demo. I take the demo and the score then add it to a new conversation with an LLM and the output is a rhythm section. etc.

What we are describing here is an evolving _process_ of collaboration. We change our view from "I did this one thing, here is the result" to "I am _doing_ this set of things over time".

The output of that "doing" is literally a graph. You have multiple inputs to each node (conversation/context) which can be traced back to initial "seed" elements.

From a collaborative perspective, each node in this graph is somewhat independent. One person can create the score. Another person can take the score and create a demo. etc.

Suno can already do that
I’d recommend GarageBand for this.
I haven't used the virtual drummer feature of GarageBand recently, but my experience with it was pretty disappointing. The output sounds very midi or like the most basic loops.

I believe there is massive room for improvement over what is currently available.

However, my larger point isn't "I want to do this one particular thing" and rather: I wish the music model companies would divert some attention away from "prompt a complete song in one shot" and towards "provide tools to iteratively improve songs in collaboration with a musician/producer".