|
A few weeks ago I had a eureka moment to describe it: GPT writes just like a non-native speaker who has spent the last month at a cram school purely aimed at acing the writing part of the TOEFL/IELTS test to study abroad. There, they absolutely cram repeatable patterns, which are easy to remember, score well and can be used in a variety of situations. Those patterns are not even unnatural - at times, native speakers do indeed use them too. The problem is dosage. GPT and cram school students use such patterns in the majority of their sentences. Fluent speakers/humans only use them once every while. The temperature is much higher! English is a huge language grammatically, super dynamic - there's a massive variety of sentence structures to choose from. But by default, LLMs just choose whichever one is the most likely given the dataset it has been trained on (and RLHF etc), that's the whole idea. In real life, everyone's dataset and feedback are different. My most likely grammar pattern is not yours. Yet with LLMs, by default, its always the same. It also makes perfect sense in a different way; at this point in time LLMs are still largely developed to beat very simplistic benchmarks using their "default" output. And English language exams are super similar to those benchmarks; I wouldn't be surprised if they were actually already included. So the optimal strategy to do well at those without actually understanding what's going on, but pretending to do so, ends up being the same. Just in this case it's LLM's pretending instead of students. I should probably write a blog post about this at some point. Some might be curious: Does this mean that it's not possible to make LLMs write in a natural way? No, it's already very possible, and it doesn't take too much effort to make it do so. I'm currently developing a pico-SaaS that does just that, inspired by seeing these comments on Reddit, and now HN. Don't worry, I absolutely won't be offering API access and will be limiting usage to ensure it's only usable for humans, so no contributing to robotic AI spam from me. I'd give you concrete examples, but in the comment in question literally every single sentence is a good example. Literally after the second sentence, the deal is sealed. There's other strong indicators besides the structure - phrasings, cadence, sentence lengths and just content in general, but you don't even really need those. If you don't see it, instead of looking at it as a paragraph, split them up and put each sentence on a newline. If you still don't see it, you could try searching for something like "English writing test essay template". I remember that there were "leaks" out of OpenAI that they had an LLM detector which was 99.9% accurate but they didn't want to release it. No idea about the veracity, but I very much believe it, though it's 100% going to be limited to those using very basic prompts like "write me a comment/essay/post about ___". I'm pretty sure I could build one of these myself quite easily, but it'll be pointless very soon anyway, as LLM UIs improve and LLM platforms will start providing "natural" personas as the norm. |
I dunno. I believe you see that in it, but to me it just reads like any other Internet comment. Nothing at all stands out about it, to me. Hence my surprise at the strong assertions by two separate commenters.