| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by grey-area 100 days ago

At this point it’s pretty easy to detect unaltered LLM output because it is such bad writing. That will change over time with training I would hope. At some point I imagine it will be hard to tell.

I honestly don’t know what sites like this will do when that happens and the only way of detecting LLMs is that they are subtly wrong or post too much, we’d be overrun with them.

Not sure if we should be hopefully or fearful that they will improve to be undetectable but I suspect they will.

4 comments

lelanthran 100 days ago

> That will change over time with training I would hope.

There's precious little training material left that isn't generated by LLMs themselves.

Consider this to be model collapse (i.e. we might be at the best SOTA possible with the approach we use today - any further training is going to degrade it).

link

matricks 100 days ago

> There's precious little training material left that isn't generated by LLMs themselves.

Percentage-wise this is quite exaggerated.

> Consider this to be model collapse (i.e. we might be at the best SOTA possible with the approach we use today - any further training is going to degrade it).

You consider this above factor to lead to model collapse? You’ve only mentioned one factor here; this isn’t enough. I’m aware of the GIGO factor, yes. Still there are at least ~5 other key factors needed to make a halfway decent scaling prediction.

It is worth mentioning one outside view here: any one human technology tends to advance as long as there are incentives and/or enthusiasts that push it. I don’t usually bet against motivated humans eventually getting somewhere, provided they aren’t trying to exceed the actual laws of physics. There are bets I find interesting: future scenarios, rates of change, technological interactions, and new discoveries.

Here are two predictions I have high uncertainty about. First, the transformer as an architectural construct will NOT be tossed out within the next five years because something better at the same level is found. Second, SoTA AI performance advances probably due to better fine-tuning training methods, hybrid architectures, and agent workflows.

link

lelanthran 99 days ago

> There's precious little training material left that isn't generated by LLMs themselves.

> Percentage-wise this is quite exaggerated.

How exaggerated?

a) The percentage is not static, but continuously increasing.

b) Even if it were static, you only need a few generations for even a small percentage to matter.

> You consider this above factor to lead to model collapse? You’ve only mentioned one factor here; this isn’t enough. I’m aware of the GIGO factor, yes. Still there are at least ~5 other key factors needed to make a halfway decent scaling prediction.

What are those other factors, and why isn't GIGO sufficient for model collapse?

link

sebastiennight 100 days ago

I wouldn't say it's "bad writing", but rather that the sheer volume of it allows the attentive reader to quickly identify the tropes and get bored of them.

Similar to how you can watch one fantastic western/vampire/zombie/disaster/superhero movie and love it, but once Hollywood has decided that this specific style is what brings in the money, they flood the zone with westerns, or superhero movies or whatever, and then the tropes become obvious and you can't stand watching another one.

If (insert your favorite blogger) had secret access to ChatGPT and was the only person in the world with access to it, you would just assume that it's their writing style now, and be ok with it as long as you liked the content.

link

grey-area 100 days ago

It is objectively bad writing:

Overly focussed on style over content

Melodrama even when discussing the mundane

Attention grabbing tricks like binary opposites overused constantly

Overuse of adjectives and adverbs in particularly inappropriate places.

Lack of coherence if you’re generating large bits of text

General dull tone and lack of actual content in spite of the tricks above

Re your assertion at the end - sure if I didn’t know I’d think it was a particularly stupid, melodramatic human who didn’t ever get to the point and probably avoid their writing at all costs.

link

kristianp 100 days ago

Sites like this will have to start using bot detection. Captchas, Anubis.

link

lucumo 100 days ago

> At this point it’s pretty easy to detect unaltered LLM output because it is such bad writing.

And yet people seem to still be terrible at that. Someone uses an em-dash and there's always a moron calling it out as AI.

> I honestly don’t know what sites like this will do when that happens and the only way of detecting LLMs is that they are subtly wrong or post too much, we’d be overrun with them.

My personal take is that it doesn't really matter. Most posts are already knee-jerk reactions with little value. Speaking just to be talking. If LLMs make stupid posts, it'll be basically the same as now: scroll a bit more. And if they chance upon saying something interesting then that's a net gain.

link

grey-area 100 days ago

Never seen this in the wild, but that sounds unfortunate about em-dashses.

Personally, I think it will matter deeply if sites like this are overrun by bots. If you believe your description, why are you here?

link