| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Animats 2154 days ago

All the current music AI projects give results which may sound “good“ to a casual listener, but they sound horribly wrong to any educated listener. The reason is that AI can only imitate the surface, but completely misses to recognize/synthesize larger structures.

Lack of "larger structures" is the key here. That's where GPT-1 was. Each sentence, in isolation, seemed to make sense, but after a few lines, it was clear the text wasn't going anywhere. By GPT-2, paragraphs seemed semi-reasonable, but multiple paragraphs didn't hold together. GPT-3 is able to keep it together for a few paragraphs, but probably not for a book chapter.

Music synthesis has the same scaling issue. Generators which imitate known patterns work for a few bars, but after a while you realize the music is going nowhere. The GPT results on text indicate that a scaleup may fix that problem.