|
|
|
|
|
by vessenes
476 days ago
|
|
This is primarily architecturally interesting in my opinion. Output songs have unusual noticeable artifacts, and I would guess they become more noticeable the more you listen. That said, wow. An end to end FAST architecture that can infer a 4.5 minute song in 10 seconds is a compelling thing. I didn’t see if we got open weights, but my guess is that this is not crazy challenging to train, and some v2/v3 versions of this are likely to be good-to-very-good. |
|