Hacker News new | ask | show | jobs
by dogma1138 1234 days ago
Im not sure if that how it actually would work out.

Most humans can’t write say an essay to save their life.

And those who do write very well tend to have their own signature.

Whilst it’s not 100% accurate we’ve managed to fairly successfully attribute a lot of unknown works to specific authors based on their known works.

So if you create a generator that produces output equals to say top 1% of human authors I’m not entirely sure that you can get one that doesn’t have its own signature.

Because whilst as you said most humans produce output that is statistically indistinguishable from most other humans the output that tends to survive selection bias and become known works is quite distinguishable by definition.

So you don’t even need to get to superhuman capability you just need to get to a high enough output quality that it would limit the statistical search space from billions to millions or even thousands.

1 comments

This may be along the lines of what you’re suggesting, but what if you flipped this around: instead of trying to recognize AI, you recognize the student? You model each student’s quirks so you can tell if they wrote their essay, or if someone else did. Now you don’t care about AI specifically; you just care about whether they wrote what they submitted.

The main failure mode I see here is students dramatically improving and throwing the system off. If someone gets a tutor or goes to writing workshops, you don’t want to accuse them of plagiarism just because they got better. But there may be ways you could deal with that, like having the student submit new samples.

That could work but that is changing the problem and moving the goal posts, a plagiarism detection system that is essentially trained on individual authors would be able to identify any time they skew too far from their rolling average.

I’m not even sure if ML is absolutely necessary for this or not.