Hacker News new | ask | show | jobs
by Jensson 1207 days ago
> The claim was that it pulled the game out of its dataset. If this were the case, I would argue it would absolutely be trivial to find them. It’s not some concept that can’t be described in words or would be hard to quantify. The rules have been provided, and, assuming they were plagiarized from somewhere else, would be listed verbatim or close to it.

ChatGPT uses word vectors, it wont use the same words but variants of the words. You can't search for that. Cases where word vectors only maps to single words with no variations for every word are very rare, so ChatGPT is very good at plagiarising things without reproducing exactly, it just rarely fails at it.

> If a student plagiarized on their work, whether in written form or in code, it’s been trivially easy to find the exact work that was copied from. It generally takes me a few seconds of searching to find it.

No it isn't, they just change the words and rewrites it until it no longer looks the same. ChatGPT is trained to rewrite texts like that to avoid triggering trivial plagiarism detectors. They train it to produce the same text, but with different words, producing exactly the same text is punished.

1 comments

> No it isn't, they just change the words and rewrites it until it no longer looks the same. ChatGPT is trained to rewrite texts like that to avoid triggering trivial plagiarism detectors. They train it to produce the same text, but with different words, producing exactly the same text is punished.

Do you think students plagiarizing don’t do the exact same thing? Clearly someone has never actually dealt with plagiarized work. This is plagiarizing 101. The structure remains the same even if they use synonyms. Considering it’s trivially easy to find in code which is magnitudes harder to pull off, I would still argue it should be easy as pie to find this supposed set of rules.

Your point is not very credible without proof of this game existing and ChatGPT pulling it from this source. Without showing this supposed proto-game having existed with rules the ChatGPT can pull from, then all you’ve done is wave your hands around and yelled “similar games exist so this can’t possibly be uniquely generated” and that’s not a very compelling argument.

> Do you think students plagiarizing don’t do the exact same thing? Clearly someone has never actually dealt with plagiarized work. This is plagiarizing 101. The structure remains the same even if they use synonyms.

You rewrite the structure of the text, you don't just use synonyms. ChatGPT is capable of rewriting text to a different structure while keeping the meaning, I hope you are aware of that.

Anyway, even if you just change the words to synonyms it wont be easy to find in a search engine. Search engines aren't very good at finding matches to synonyms. Google tries, but in doing so they fail to find more specific texts like scientific publications or documentation, so no search engines aren't good at finding plagiarism.

Edit: And you make it sound like most plagiarism is found. No, that isn't the case, most plagiarism is not found out because it is a very hard problem to solve. Only the most blatant cases are caught. For humans that is reasonable, for AI we can be stricter since there isn't a humans career at stake.

> Anyway, even if you just change the words to synonyms it wont be easy to find in a search engine.

Got it, so you’ve never actually dealt with plagiarized work. You should have just led with that.

I have literally said, from actual experience, that this is the case. But I guess discarding that and pretending it was never said and that the opposite is true is I’m sure an easier position to hold.

Do you believe you never missed any plagiarised work examples? You caught some people doing X, and then you declare that catching people who do X is trivial. But plenty of people get away with doing X so we know that it isn't easy to catch.

For students they are probably easier to catch since they use the same tools you do, they use a search engine to find an article and plagiarises that. But ChatGPT takes deep discussions from reddit or stack overflow, I can't find those with a search engine.

If it’s as blatant as copying the entire game, you’d think it would be easier for you to find the game it copied. By your own account, this is an example of an obvious case of plagiarism. You were dead set on it, 100% sure.

Yet here we are. Dozen comments later and still no written set of rules produced which definitively shows that it was copied.

Come back when you actually have that and maybe we can continue this conversation.

> But ChatGPT takes deep discussions from reddit or stack overflow, I can't find those with a search engine.

Where do you think the answers come from? It’s not like Google has a massive index island around Reddit and SO.

I tend to exaggerate my claims a bit, yes. But you exaggerate your claims as well, for example you claim that if it had copied the rules it would be easy to find an example, that isn't true at all. Many examples of plagiarism goes unnoticed for years, until someone who is familiar with the original work points it out. I know examples where the person was found out during his thesis defence, he had plagiarised his entire PhD work from papers in another language and nobody noticed until years later, not even all the peer reviewers of the papers.

So maybe these rules are described in Japanese? Most similar games comes from Japan, Kakuro, Sudoku etc. Would your plagiarism detection method of Googling it find a Japanese source? I doubt it. But ChatGPT transcends language barriers, it can translate to English just fine.