| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by nrmitchi 289 days ago

This seems like a (potential) solution looking for a nail-shaped problem.

Yes, there is a huge problem with AI content flooding the field, and being able to identify/exclude it would be nice (for a variety of purposes)

However, the issue isn't that content was "AI generated"; as long as the content is correct, and is what the user was looking for, they don't really care.

The issue is content that was generated en-masse, is largely not correct/trustworthy, and serves only to to game SEO/clicks/screentime/etc.

A system where the content you are actually trying to avoid has to opt in is doomed for failure. Is the purpose/expectation here that search/cdn companies attempt to classify, and identify, "AI content"?

2 comments

TylerE 289 days ago

It's the evil bit, but unironically.

link

edoceo 289 days ago

For today's lucky 10k:

https://www.ietf.org/rfc/rfc3514.txt

Note date published

link

0xDEAFBEAD 289 days ago

>Attack applications may use a suitable API to request that [the evil bit] be set. Systems that do not have other mechanisms MUST provide such an API; attack programs MUST use it.

Potential flaw: I'm concerned that attackers may be slow to update their malware to achieve compliance with this RFC. I suggest a transitional API: Intrusion detection systems respond to suspected-evil packets that have the evil bit set to 0 with a depreciation notice.

link

jrochkind1 289 days ago

deprecation notice

link

yahoozoo 289 days ago

It says in the first paragraph it’s for crawlers and bots. How many humans are inspecting the headers of every page they casually browse? An immediate problem that could potentially be addressed by this is the “AI training on AI content” loop.

link

TrueDuality 289 days ago

How many of the makers of these trash SEO sites are going to voluntarily identify their content as AI generated?

link

TheRoque 289 days ago

Moreover, I find it ironic that website owners will gracefully give AI companies the power to identify what is "good" data and what is not. I mean, why would I do the work for them and identify my data as AI, so that they would ignore it ? "yes please, take all my work, this is quality content, train on it, it's free !" that's what it sounds like

link

nrmitchi 289 days ago

It would still be required for the content producer (ie, the content-spam-farm) to label their content as such.

The current approach is that the content served is the same for humans and agents (ie, a site serves consistent content regardless of the client), so who a specific header is "meant for" is a moot point here.

link

nikolayasdf123 289 days ago

I believe this is why Google did SynthID https://deepmind.google/science/synthid/

link