Hacker News new | ask | show | jobs
by hxugufjfjf 1285 days ago
The GPT output detector certainly thinks you are generating this and multiple other comments with GPT. https://i.imgur.com/ps0a54c.png
1 comments

It's wrong. Does this mean as a hooman I've failed a Turing test? You have my entire corpus on hand here which predates GPT general release, find a long one from times past and see what it says? I'd argue this suggests the bot detector has a weak language model.

If this genuinely interests you, I can suggest several sources of writing by others which I suggest would fail the test.

Interesting. I don't know exactly what it means or what the implication of this i. Before continuing, I just want to clarify that I didn't intend to "call you out" on generating comments with GPT. I just thought it was interesting to test, because someone did actually acusse you of it. And what I found was that the current detection model (https://huggingface.co/openai-detector/) based on GPT-2 suggests that this might be the case (> 99.5 proability of being GPT), and I thought that was very interesting. And to me, it becomes even more interesting when you specifically deny that it was.

For the record I wrote a UserScript to easily identify this (https://news.ycombinator.com/item?id=33906712) based on the model from above, so its very easy to test comments. I've used the extension during browsing HN the last few days and this is the first time I've seen a comment identified confidently as AI, where its not the case (given that you are not lying). It sounds I'm calling you out, but I'm genuinly just curious as this is the first true false positive I've seen. The extension has so far only identified not-human comments as AI, but that's just my observation and not some empirical research.

Because you asked, I went a few weeks back in your comment history, at least predating ChatGPT (even though GPT-3 has been available for longer than that) and "tested" a few of your longer comments, and I can't find a single comment that even as much as hints to you using GPT-3 to generate comments. Most of your comments that I sampled are identified as < 10 % probability of being GPT generated. Based on this we could either conclude that the above comment was an anomaly in the way you write, or that you are lying, or that the model is not sufficient, like you suggest.

I am genuinly interested in this, and my intentions are absolutely not malicious. If you want to share sources of writing by others you suggest will "fail the test" then I'm interested in testing that.

I assure you I am not lying. It would be hard to prove because even under invigilated writing you could argue the negatives were evidence I must have been in the prior case.

For others to test, try Dave Mills who worked on NTP and Jon Crowcroft, now Cambridge uni, both of whom had/have what you might call "idiosyncratic" styles of writing.

I'd go a lot further back than a few weeks in mine. I tend to comment in fits and starts. If you hunt for other times people said "why do you write like this it's stupid"

Btw don't try their corpus of formal writing, it's less likely to trigger. They wrote extensively in pre commercial internet mailing lists where I think you'd find the style of writing.

I don't think you are lying and by no means am I accusing you of lying, or trying to "prove" that you could be. I just think we've discovered something interesting and I'm trying to make sense of it. It does appear on further testing that even minor tweaks to the way to write would easily tip your text from 99.98 AI to 99.98 % human. I can only conclude that the detecion model's numbers are unintentionally deceiving, because its a very fine line. Without more information about what "triggers" detection in the comment you made, I don't see a way to "get to the bottom" of this. I am trying to learn more about it now: https://huggingface.co/roberta-base-openai-detector

I did test those you suggested, with various excerpts from their mailing list communication, research and books. Both people got on avergage >99.98% probability of being real, both with large and small samples from all the different sources. I'm not sure what to make of this.

Hmm. I'd hoped either or both of them would tickle the same detection logic as me. I must be even more incoherent than I thought.