Hacker News new | ask | show | jobs
by Jevon23 663 days ago
ChatGPT comment?
1 comments

It is, and I'm curious what dang and HN's plan is wrt this issue going forward. On one hand, the "assume good faith" has been a core tenet of this community. At the same time, LLM-generated walls of text aren't good faith. And they're not going to get less common from here on out.

I'm also surprised by how many human replies these comments get, seemingly unaware what they're responding to, given that it's HN and how long it's been since the release of GPT-3, I thought a larger percentage of readers would notice.

>> ChatGPT comment?

> It is

What? Huh? How did you determine this?

Very obvious ChatGPT style and structure. Here's another one of his comments copy/pasted from ChatGPT. Many others have called him out on this. He is a pathological liar.

https://news.ycombinator.com/item?id=41274200 and then his later reply in the same thread, also written by ChatGPT https://news.ycombinator.com/item?id=41277573

It's truly a rorschach test of sorts. I agree with you that there isn't enough information to say, but reading through the comment history of the commenter in question does not make it seem more likely that they are GPT. Reminds me of Fallout 4 with everyone suspicious of each other being synths.
On the contrary, the comment history makes it very clear.

Pages and pages of relatively short comments, not a single one written in a remotely LLM-reminiscent style. Then, within a very short period, multiple very long comments in exactly the default style that GPT writes in.

The chances of someone waking up some day and entirely changing their writing style might as well be zero, I've never seen it. It would be a gradual process if everyone.

I read HN every day and I think this is only the 2nd time I've come across clearly generated content. If suspicion is the issue, that should be much more frequent. On Reddit it's already more common, and I've already had multiple people admit to it when pointed out, asking "How did you know?".

It does help that I've spent the last 1.5 years building LLM-based products every day.

Is the redundancy giving you the hint it’s GPT? I would love to know what it is that has convinced you but seemingly cannot explain.
A few weeks ago I had a eureka moment to describe it: GPT writes just like a non-native speaker who has spent the last month at a cram school purely aimed at acing the writing part of the TOEFL/IELTS test to study abroad. There, they absolutely cram repeatable patterns, which are easy to remember, score well and can be used in a variety of situations. Those patterns are not even unnatural - at times, native speakers do indeed use them too.

The problem is dosage. GPT and cram school students use such patterns in the majority of their sentences. Fluent speakers/humans only use them once every while. The temperature is much higher! English is a huge language grammatically, super dynamic - there's a massive variety of sentence structures to choose from. But by default, LLMs just choose whichever one is the most likely given the dataset it has been trained on (and RLHF etc), that's the whole idea. In real life, everyone's dataset and feedback are different. My most likely grammar pattern is not yours. Yet with LLMs, by default, its always the same.

It also makes perfect sense in a different way; at this point in time LLMs are still largely developed to beat very simplistic benchmarks using their "default" output. And English language exams are super similar to those benchmarks; I wouldn't be surprised if they were actually already included. So the optimal strategy to do well at those without actually understanding what's going on, but pretending to do so, ends up being the same. Just in this case it's LLM's pretending instead of students.

I should probably write a blog post about this at some point. Some might be curious: Does this mean that it's not possible to make LLMs write in a natural way? No, it's already very possible, and it doesn't take too much effort to make it do so. I'm currently developing a pico-SaaS that does just that, inspired by seeing these comments on Reddit, and now HN. Don't worry, I absolutely won't be offering API access and will be limiting usage to ensure it's only usable for humans, so no contributing to robotic AI spam from me.

I'd give you concrete examples, but in the comment in question literally every single sentence is a good example. Literally after the second sentence, the deal is sealed.

There's other strong indicators besides the structure - phrasings, cadence, sentence lengths and just content in general, but you don't even really need those. If you don't see it, instead of looking at it as a paragraph, split them up and put each sentence on a newline. If you still don't see it, you could try searching for something like "English writing test essay template".

I remember that there were "leaks" out of OpenAI that they had an LLM detector which was 99.9% accurate but they didn't want to release it. No idea about the veracity, but I very much believe it, though it's 100% going to be limited to those using very basic prompts like "write me a comment/essay/post about ___". I'm pretty sure I could build one of these myself quite easily, but it'll be pointless very soon anyway, as LLM UIs improve and LLM platforms will start providing "natural" personas as the norm.