Hacker News new | ask | show | jobs
by alex_duf 34 days ago
I'm seeing this stance a lot "this is obviously AI generated"

Why? What's LLM generated? How can you tell?

To me what's obvious is that our trust system is already breaking down. Commenters accusing each other of being AIs is also another example of this.

7 comments

>Why? What's LLM generated? How can you tell?

Not the guy you're responding to, but:

1. The high number of (em) dashes is suspect, though it's unclear whether they manually replaced the em dashes or is actually human generated.

2. "One additional failure worth noting: one incident response professional in the HN thread, raised a concern that operates independently of the bot problem" feels out of place for a content marketing piece. HN isn't popular enough to be invoked as a source, and referencing it as "the HN thread" seems even weirder, as if the author prompted "write a piece about how google cloud defense sucks, here are some sources: ..."

3. This passage is also suspect because it follows the chained negation pattern, though it's n=1

>No hardware identifier is transmitted. No attestation is required. No certification layer determines who may participate.

edit:

I also noticed there are 2 other comments that are flagged/dead expressing their reasons.

> actually human generated

Human written, not generated.

> HN isn't popular enough to be invoked as a source

Excuse me, what do you mean there? The author happens to read HN too.

>Excuse me, what do you mean there? The author happens to read HN too.

Read the rest of the comment. It's not suspect because it's referencing HN, it's suspect because of the way it's referencing HN. Specifically, its use of the phrase "the HN thread", even though it wasn't mentioned before. Maybe it's a editing gaff, but it's also consistent with how an LLM would write if presented with a list of sources.

Yep, this feels like a smoking gun. The others are circumstantial, maybe indicative, maybe not. While there’s a chance this is an editing gaff, its overwhelmingly likely to be LLM, ahem, “cruft”.
Looks like the moderators are actively deleting comments that call out AI generated articles now. Grim. This comment will probably be deleted too.
What did you see that made you think that? (It's entirely untrue btw.)

We haven't said anything specific about genai articles but if you've seen https://news.ycombinator.com/newsguidelines.html#generated or https://news.ycombinator.com/item?id=47340079 it shouldn't be hard to extrapolate.

Both comments appeared as [dead] within a few minutes of being made, despite not appearing as [flagged].

They're visible now, but still. What caused them to appear as [dead] in the first place?

There are several possible reasons, so I'd need links to the specific posts in order to answer.
Mine: https://news.ycombinator.com/item?id=48065850

There was another sibling comment posted around the same time that was also dead.

[flagged]
Quite the opposite. That user's comment was killed because it was classified as AI-generated. Of course it was a false positive due to the AI-generated text they quoted. These systems aren't foolproof. But we're very serious about preserving HN for curious conversation between humans.
> But we're very serious about preserving HN for curious conversation between humans.

How does that work when most of the articles posted are now AI generated?

I've ranted about this before, but the gist of the problem is that you're expecting humans to put effort into discussing something that the "author" did not consider worth the effort of creating. There's a fundamental imbalance there that causes the whole "the author put effort into creating this so you should put effort into discussing it" system that encourages high-effort posting to fall apart.

If it’s bad writing it’s not a good fit for HN and should be flagged. Writing that’s obviously AI-assisted is bad writing. We’re fine with it being flagged and down-weighted off the front page. We routinely refrain from re-upping posts that are obviously AI-assisted. Things still slip through because we don’t see everything in advance.
The choppy language is the biggest trigger for me. Examples:

* "With Fraud Defense, there was no process to respond to. The product launched. The requirements page went live."

* "That is not a technical limitation waiting to be engineered around. It is the mechanism."

* "The defeat is mechanical. Bot operators point a camera at a screen, a trivial automation with off-the-shelf hardware."

I could be wrong, of course. Maybe humans are starting to write like LLM's, or maybe it's just confirmation bias on my part.

Look at the number of : per paragraph. What human puts two : in a single sentence?

"One additional failure worth noting: one incident response professional in the HN thread, raised a concern that operates independently of the bot problem: …"

The ersatz Ted Talk meets LinkedInfluencer rhythm of sentences, the throat clearing fillers as connective tissue…

Or Wikipedia: https://en.wikipedia.org/wiki/Wikipedia:Signs_of_AI_writing

I do. I usually notice and try to rephrase, though.

(Also, you can pry my em dashes[1] from my cold, dead hands.)

[1] https://www.gally.net/miscellaneous/hn-em-dash-user-leaderbo... says mean 1.64, maximum 13 em dashes per pre-ChatGPT comment.

The entire article is just one long stream of short, punchy, declarative sentences. The latest Claude models are notorious for writing like this.

There's also a few cookie-cutter patterns that should immediately jump out at you if you're at all familiar with AI writing, such as:

> No hardware identifier is transmitted. No attestation is required. No certification layer determines who may participate. User privacy is structurally preserved, not promised.

> Google Cloud Fraud Defense is not a reCAPTCHA update. The QR code is the visible mechanism, but device attestation is the real product.

It's really obvious. The repeated information. The very. short. sentences. The incessant detail. The tangents that go nowhere. And LLMS always try to structure the entire essay into topical sub-sections.
They can't tell. It has become a statistical thing. There will exist some percentage of them that assumes an item is AI generated. With enough people seeing something, you'll see the accusation.
"this is AI" is the new "This is shopped", but without the "I can tell by the pixels" rejoinder.

I mean sometimes they're right, but honestly in this day and age does that even matter?