Hacker News new | ask | show | jobs
by nonhaver 322 days ago
this is great. i think extensions that detect generated music, speech, video, or text will become really important. im curious how light and performant these detection models can get. maybe a single extension could handle multiple media types.

one concern (speaking as someone who doesnt know what these internal pipelines look like) is that suno/udio could tweak their model weights just enough to change the fingerprint, making a detector obsolete with each new release (or even more simple - maybe just apply post processing? id imagine a small reverb could diffuse the content enough to make the fingerprint difficult to detect). that turns it into a cat‑and‑mouse game. if its cheaper for them to mutate models/tweak post processing than for others to train new detectors, they could spin up a new fingerprint every day.

1 comments

What kind of tweak has enough of an impact is still an open question. According to the paper it does generalize a bit between different models, but at least different architectures require retraining for coverage.