|
|
|
|
|
by nowayno583
651 days ago
|
|
Intuitively, audio is way more sensitive to phase and persistence because of the time domain. So maybe audio models look more like video models instead of image models? I'm not really sure how current video generating models work, but maybe we could get some insight into them by looking at how current audio models work? I think we are looking at an auto regression of auto regressions of sorts, where each PSD + phase is used to output the next, right? Probably with different sized windows of persistence as "tokens". But I'm a way out of my depth here! |
|
In images, scrambling phase yields a completely different image. A single edge will have the same spectral content as pink/brown~ish noise, but they look completely unlike one another.