Hacker News new | ask | show | jobs
by sebstefan 394 days ago
They're never going to manage to do that, just on a technical level

Plus some users might want to legitimately upload things with AI-generated content in it

3 comments

I'm pretty sure YouTube saves the metadata from all the video files uploaded to it. It seems pretty trivial to exclude videos uploaded without camera model or device setting information. I seriously doubt even a tiny fraction of people uploading AI content to YouTube are taking the time to futz about with the XMP data before they upload it. Sure, they'll miss out on a lot of edited videos doing that, but that's probably for the best if you're trying to create a data set that's maintaining fidelity to the real world. Lots of ways to create false images without AI
"Since launching in 2023, SynthID has watermarked over 10 billion images, videos, audio files and texts, helping identify them as AI-generated and reduce the chances of misinformation and misattribution. Outputs generated by Veo 3, Imagen 4 and Lyria 2 will continue to have SynthID watermarks.

Today, we’re launching SynthID Detector, a verification portal to help people identify AI-generated content. Upload a piece of content and the SynthID Detector will identify if either the entire file or just a part of it has SynthID in it.

With all our generative AI models, we aim to unleash human creativity and enable artists and creators to bring their ideas to life faster and more easily than ever before."

From the page linked in the post....

So there's different ways to detect AI generated content (videos/images atleast). (https://www.nature.com/articles/s41586-024-08025-4 <-- paper on synthID / watermarking and detecting it with LLMs)

I somewhat doubt that YT cares much about AI content being uploaded, as long as it’s clearly marked as such.

What they do care about is their training set getting tainted, so I imagine they will push quite hard to have some mechanism to detect AI; it’s useful to them even if users don’t act on it.

> They're never going to manage to do that, just on a technical level

Why not? Given enough data, it's possible to train models to differentiate - especially since humans can pick up on the difference pretty well.

> Plus some users might want to legitimately upload things with AI-generated content in it

Excluding videos from training datasets doesn't mean excluding them from Youtube.

I agree, especially because in practice the vast majority of AI-generated videos uploaded to YouTube are going to be from one of about 3 or 4 generators (Sora, Veo, etc.). May change in the future, but at the moment the detection problem is pretty well constrained.
> Excluding videos from training datasets doesn't mean excluding them from Youtube.

Ah then sure. It was this part that was problematic.

If users are still allowed to upload flagged content, then false positives almost don't matter, so Youtube could just roll out some imperfect solution and it would be fine