| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by michaericalribo 1232 days ago

I foresee a dystopian education outcome:

1. Classifiers like this are used to flag possible AI-generated text

2. Non-technical users (teachers) treat this like a 100% certainty

3. Students pay the price.

Especially with a true positive rate of only 26% and a false positive rate of 9%, this seems next to useless.

37 comments

LarryMullins 1232 days ago

> 2. Non-technical users (teachers) treat this like a 100% certainty

This is the part that needs to be addressed the most. Teachers can't offload their critical reasoning to the computer. They should ask their students to write things in class and get a feeling for what those individual students are capable of. Then those that turn in essays written at 10x their normal writing level will be obvious, without the use of any automated cheat detectors.

I was once accused of cheating by a computer; my friend and I both turned in assignments that used do-while loops, which the computer thought was so statistically unlikely that we surely must have worked together on the assignment. But the explanation was straight forward; I had been evangelizing the aesthetic virtue of do-while loops to anybody that would listen to me, and my friend had been persuaded. Thankfully the professor understood this once he compared the two submissions himself and realized we didn't even use the do-while loop in the same part of the program. There was almost no similarity between the two submissions besides the statistically unlikely but completely innocuous use of do-while loops. It's a good thing my professor used common sense instead of blindly trusting the computer.

munificent 1232 days ago

I think you're misunderstanding the primary purpose of essays.

Teachers don't have the time to do deep critical reasoning about each student's essay. An essay is only partially an evaluation tool.

The primary purpose of an essay is that the act of writing an essay teaches the student critical reasoning and structured thought. Essays would be an effective tool even if they weren't graded at all. Just writing them is most of the value. A big part of the reason they're graded at all is just to force students to actually write them.

The main problem with AI generated essays isn't that teachers will lose out on the ability to evaluate their students. It's that students won't do the work and learn the skills they get from doing the work itself.

It's like building a robot to do push ups for you. Not only does the teacher no longer know how many push ups you can do, you're no longer exercising your muscles.

YeGoblynQueenne 1232 days ago

>> The primary purpose of an essay is that the act of writing an essay teaches the student critical reasoning and structured thought. Essays would be an effective tool even if they weren't graded at all. Just writing them is most of the value. A big part of the reason they're graded at all is just to force students to actually write them.

That's our problem, I think. Education keeps failing to convince students of the need to be educated.

Zababa 1232 days ago

I think that students know they need to be educated, but they also know that grading/academic success, in the form of good grades and going to prestigious universities, matters more than actual knowledge in the real world. And the funny thing is that if you teach critical reasoning to someone, there's a good chance they will use that skill to realize that the grade of the essay matters more than the actual process of writing it.

I think companies face a similar problem when they try to introduce metrics to evalute performance, either of individual employees or of whole parts of the company, and people start focusing on gaming these metrics instead of doing what's actually beneficial to the company. One reason for that is probably that it's really hard to evalute what actually beneficiates the company, and what part you played in it.

Back to students, maybe writing that essay instead of asking GPT-3 is more beneficial in the long run, but on the other hand you're also learning to use a new tech that will keep getting better, but maybe you're not learning the "value of hard work correctly", etc etc. Evaluating what's good for you is very hard, focusing on a good grade is easier and has noticable positive results. I think getting educated is very important, but I also think no one can certainly known if learning to use AI is actually a worse thing that doing stuff "yourself".

All in all, it's a very hard problem. It's trying to see the consequences of our own actions in very complex systems. And different people work differently. For example, when I use ChatGPT or Copilot, I end up spending more time overall working, and producing way more stuff even without counting what the AI "produced", because the back and forth between me and the AI is a more natural way of working for me. In the same vein, it's easier for me to write or even think by acting out a conversation. Maybe for some people it's the exact opposite and they need to be alone with their thoughts to be more productive.

munificent 1231 days ago

Delaying gratification is hard for all of us. We're just primates doing the best we can with our limited wetware.

hndamien 1232 days ago

Seem like it would be fairly trivial to make a document writer that measured if a human was doing the typing such that it was much more likely to have been written by a human sitting and thinking at a keyboard. We do it in ad fraud detection all the time at scale with much less willing participants.

BurningFrog 1232 days ago

The value of a degree is very clear.

The value of an education is much less clear.

I'm saying the students are probably right.

Al-Khwarizmi 1232 days ago

> It's like building a robot to do push ups for you. Not only does the teacher no longer know how many push ups you can do, you're no longer exercising your muscles.

While I already knew what you have described, I love this analogy, it's really spot on.

thelock85 1232 days ago

For this exact reason, I feel like education systems and curriculum providers (teachers are just point of contact from a requirements perspective) should develop much more complex essay prompts and invite students to use AI tools in crafting their responses.

Then it’s less about the predetermined structure (5 paragraphs) and limited set of acceptable reasoning (whatever is on the rubric), and more about using creative and critical thinking to form novel and interesting perspectives.

I feel like this is what a lot of universities and companies currently claim they want from HS and college grads.

desro 1232 days ago

This is what I'm doing as an instructor at some local colleges. A lot of the students are completely unaware of these tools, and I really want to make sure they have some sense of how things are changing (inasmuch as any of us can tell...)

So I invite them to use chatGPT or whatever they like to help generate ideas, think things out, or learn more. The caveat is that they have to submit their chat transcript along with the final product; they have to show their work.

I don't teach any high-stakes courses, so this won't work for everyone. But educators are deluded if they think anyone is served by pretending that (A) this doesn't/shouldn't exist, and that (B) this and its successors are going away.

All of this stuff is going to change so much. It might be a bigger deal than the Internet. Time will tell.

sitkack 1232 days ago

I like this technique. You could also take a ChatGPT essay and have the students rewrite it or analyze for style.

Or have a session on how to write the prompts to generate the good stuff. In the hands of a skilled liberal artist, the models produce amazing results.

Yes the tool is powerful, but it still requires skills, knowledge and an ascetic voice.

Al-Khwarizmi 1232 days ago

A student can't go from zero to "much more complex essay prompts", though. Education has to go step by step. The truth is that humans start at a lower writing skill that ChatGPT. Before getting better than it, they need to first reach its level.

And then, there is the problem that those complex prompts might also become automatable when GPT-4 or GPT-5 is released.

class4behavior 1232 days ago

>Teachers don't have the time to do deep critical reasoning about each student's essay.

Projection much? Who are you speaking for? What countries, what states?

It's difficult to dive in that deep into someone's essay in any case. That's the challenge, not the lacking quality of one's education system.

Fomite 1232 days ago

I read every student essay I grade twice. Small classes, admittedly, but this has always been my practice.

geph2021 1232 days ago

   ask their students to write things in class and get a feeling for what those individual students are capable of. Then those that turn in essays written at 10x their normal writing level will be obvious

I think that's a flawed approach. Plenty of people simply don't perform or think well under imposed time-limited situations. I believe I can write close to 10x better with 10x the time. To be clear, I don't mean writing more, or a longer essay, given more time. Personally, the hardest part of writing is distilling your thoughts down to the most succinct, cogent and engaging text.

deepspace 1232 days ago

> Plenty of people simply don't perform or think well under imposed time-limited situations

From first-hand experience, the difference between poor stress-related performance and a total lack of knowledge is night and day.

I have personally witnessed students who could not speak or understand the simplest English, and were unable to come up with two coherent sentences in a classroom situation, but turned in graduate level essays. The difference is blindingly obvious.

giovannibonetti 1232 days ago

> I have personally witnessed students who could not speak or understand the simplest English, and were unable to come up with two coherent sentences in a classroom situation, but turned in graduate level essays. The difference is blindingly obvious.

Maybe someone helped them with their homework?

remexre 1232 days ago

Unless their in-class performance increases as well, isn't that help "probably cheating"? (That's the "moral benchmark" I'd use, at least; if your collaboration resulted in you genuinely learning the material, it's probably not cheating.)

runarberg 1232 days ago

The point is for the teacher to get a sense of the students style and capabilities. Even if your home essay is 10x better and 10x more concise as your in class work, a good teacher that knows you—unlike an inference model—will be able to extrapolate and spot commonalities. Also a good teacher (that isn’t overworked) will also talk to students and get a sense of their style and capabilities that way, this allows them to extrapolate even better then a computer could ever hope to.

zopa 1232 days ago

Sure, but what about all the students with mediocre and/or overworked teachers? If our plan assumes the best-case scenario, we're going to have problems.

runarberg 1232 days ago

Honestly if we can’t have nice things and we keep skimping out on education, I’d rather we just accept the fact that some will students cheat, then to introduce another subpar technical solution to a societal problem.

londons_explore 1232 days ago

> blindly trusting the computer.

Professors blindly trust the computer not out of laziness, but to protect themselves from accusations of unfairness...

"The work was detected as plagiarism, but the professor overrode it for the pretty girl in class, but not for me"

mitchdoogle 1232 days ago

Seems like something like this should only be used as a first-level filter. If the writing doesn't pass, it warrants more investigation. If no proof of plagiarism is found, then there's nothing else to do and professor must pass the student

TchoBeer 1232 days ago

with a 26% true positive rate that seems flawed.

busyant 1232 days ago

I asked chatgpt to write an essay as if it were written by a mediocre 10th grader. It did a reasonably good job. It threw in a little bit of slang and wasn’t particularly formal.

Edit. I sometimes tell my students “if you’re going to cheat, don’t give yourself a perfect score, especially if you’ve failed the first exam. It fires off alarm bells.”

But the students who struggle usually can’t calibrate a non-suspicious performance.

I guess the same applies here.

Baeocystin 1232 days ago

You've touched upon a central issue that is not often addressed in these conversations. People who have difficulty comprehending and composing essays also struggle to work with repeated prompts in AI systems like ChatGPT to reach a solution. I've found in practice that when showing someone how prompting works, their understanding either clicks instantly, or they fail to grasp it at all. There appears to be very little in between.

asah 1232 days ago

seems like this is the future... 1. first day of class, write a N word essay and sign a release permitting this to be used to detect cheating. The essay topic is chosen at random.

2. digitize & feed to learning model, which detects that YOU are cheating.

upside: this also helps detect students who are getting help (e.g. parents)

downside: arms race as students feed their cheat-essays (memorize their essays?) into AI-detection models that are similarly trained.

kaibee 1232 days ago

The funniest implication here is that the student's writing skill isn't expected to improve.

eh9 1232 days ago

I was just asking my partner who’s a writer if it would even be fair to train a model based on a student at Nth grade if the whole point is to measure growth. Would there be enough “stylistic tokens” developed in a young person’s writing style?

ask_b123 1232 days ago

Personally, I feel mildly embarrassed when reading my essays from years prior. And I probably still count as a 'young person'.

That said, there's no need to consider changes in years when stylistic choices can change from one day to another depending on one's mood, recent thoughts, relationship with the teacher, etc.

That's why I've always been a little confused about how some (philologists?) treat certain ancient texts as not being written by some authors due to the text's style, as if ancient people could not significantly deviate from their usual style.

Aransentin 1232 days ago

> first day of class, write a N word essay

Initially I thought you meant having the student write an essay about slurs, as the AI will refuse to output anything like that. Then I realized you meant "N" as in "Number of words".

Still, that first idea might actually work; make the students write about hotwiring cars or something that's controversial enough for the AI to ban it but not controversial enough that anybody will actually care.

dragonwriter 1232 days ago

> upside: this also helps detect students who are getting help (e.g. parents)

Downside: it also likely detects, without differentiation, students whose writing style undergoes a major jump because of learning, which is, you know, the actual thing you are trying to promote.

JumpCrisscross 1232 days ago

> first day of class, write a N word essay and sign a release permitting this to be used to detect cheating

Why once? Most students need writing skills more than half the high-school curriculum.

feanaro 1232 days ago

There are also some countries that don't fetishize cheating this much so perhaps they will just continue not caring.

Zababa 1232 days ago

Arms race are not really an issue, you've managed to make your student work, one way or another.

userbinator 1232 days ago

Programming is fortunately one of those subjects where there's something objectively close to a correct/optimal solution. A trivial example is that there aren't very many sane ways to write a "Hello world" program, but this seems to hold for more complex tasks too. In fact, in my experience, the ones who cheat and get it wrong are the most obvious.

Unfortunately, the software industry also has plenty of literal tools who are far too trusting of what the computer says (or authority in general, but that's another rant...)

Gigachad 1232 days ago

I once got called up because my work was flagged as 100% copied. I had uploaded it, made a mistake so I deleted it and uploaded a new file. Second file was flagged as copied. Was able to explain it by pointing at the screen that was claiming I plagiarized my own name.

runarberg 1232 days ago

So the computer’s evaluation model assumed that each student’s learning is independent? That seems like a ludicrous assumption to put in a model like this, unless the model authors have never been in a class setting (which I doubt).

TheDudeMan 1232 days ago

You are asking teachers to be good at their job. But is teaching a merit-based profession?

TheRealPomax 1232 days ago

So, status quo then? This is already the case for educational software that's used to detect plagiarism. People get wrongly flagged, and then you'll have to plead your case.

But the times software like this finds actual problems vastly outnumbers of times it doesn't, and when you choice is between "passing kids/undergrads who cheat the system" and "the occasional arbitration", you go with the latter. Schools don't pay teachers anywhere near enough to not use these tools.

PeterisP 1232 days ago

Currently the false positive rate is far lower. E.g. I get 500-ish submissions over a school year then a 1% false positive rate would mean I'd falsely accuse 5 innocent students annually, which isn't acceptable at all - and a 9% FP rate is so high that's even not worth investigating; do you know of any grader who has the spare time to begin formal proceedings/extra reviews/investigation for 9% of their homework?

For plagiarism suspicions at least the verification is simple and quick (just take a look at the identified likely source, you can get a reasonable impression in minutes) - I can't even imagine what work would be required to properly verify ones flagged by this classifier..

Fomite 1232 days ago

I really wish they'd have provided their false positive rate over several lengths of document, rather than an overall estimate. Because if it dives after say, 1,500 words, that's a relevant piece of information for its use.

I'm pessimistic, given they chose not to do so.

TheRealPomax 1232 days ago

> I can't even imagine what work would be required to properly verify ones flagged by this classifier.

Yet.

flatline 1232 days ago

At the same time the classifier is improving, the generative models are improving. It’s a classic arms race and this equilibrium is not likely to shift much either way. We are talking about models that approximate human behavior with a high degree of accuracy, I think the goal would be to make them indistinguishable in any meaningful way.

PeterisP 1232 days ago

Can you elaborate?

I don't think that this is something that can change through tech advances for the classifiers - in all cases the classifier is just flagging for investigation, it's not sufficient for any action. For plagiarism, appropriate evidence comes from a person comparing the submission with the possible source of plagiarism. For this one, the proper evidence would require getting confirmation that the student actually generated that data - e.g. identifying the exact tool and prompt that was used, or logs from the students' computer showing that this was done, or logs from the text generation service provider. All of those are quite tricky to get and perhaps even not possible.

michaericalribo 1232 days ago

Given the published true and false positive rates, it's clear that the true positives do not "vastly outnumber" false positives.

notahacker 1232 days ago

> This is already the case for educational software that's used to detect plagiarism. People get wrongly flagged, and then you'll have to plead your case.

How often is that the case though? A while since I've had to worry about it, but I thought plagiarism detection generally worked on the principle of looking for the majority of the content being literal matches with existing material out there with only a few small edits, which - unlike using some "AIish" turns of phrase a bot wrongly attributes to humans 9% of the time and correctly attributes to AI with a not much better success rate - is pretty hard to do accidentally.

i_have_an_idea 1232 days ago

A long time ago when I was a student, I would run my papers through Turnitin before submitting. The tool would sometimes mark my (completely original) work as high as mid 20% similarity.

As a result, I have taken out quotes and citations to appease it and not have to deal with the hassle.

I expect modern day students will resort to similar measures.

notahacker 1232 days ago

IIRC the marker got the same visualization that you used to take out quotes and citations that highlighted that the similar bits were in fact quotes and citations!

Maybe high school is a different matter, but I'm pretty sure even the most technophobic academic knows that jargon, terse definitions and the odd citation overlapping with stuff other people have written is going to make a similarity of at least 10% pretty much inevitable, especially when the purpose of the exercise is to show you understand the core material well enough to cite and paraphrase and compare it, not to generate novel academic insight or show you understood the field so well you didn't need to refer back to the source material. The people they were actually after were the ones that downloaded something off essaybank, removed a couple of paragraphs and rewrote the intro to match the given title and ended up with 80%+ similarity

claytonjy 1232 days ago

Is there a longer-form paper on this yet? TPR (P(T|AI)) and FPR (P(T|H)) are useful, but what I really want is the probability that a piece flagged as AI-generated is indeed AI-generated, i.e. P(AI|T). Per Bayes rule I'm missing P(AI), the portion of the challenger set that was produced by AI.

If we assume the challenger set is evenly split 50-50, that means

    P(AI|T) = P(T|AI)P(AI)/P(T) = (0.26)(0.5)/(0.26+0.09) ~ 37%

So slightly better than a 1/3 chance of the flagged text actually being AI-generated.

They say the web-app uses a confidence threshold to keep the FPR low, so maybe these numbers get a bit better, but very far from being used as a detector anywhere it matters.

TchoBeer 1232 days ago

>Per Bayes rule I'm missing P(AI), the portion of the challenger set that was produced by AI

This will obviously depend on your circumstances.

drc500free 1232 days ago

Precision is impossible to calculate without knowing P(AI), which is use-case specific.

Source: Spent 10 years trying to explain this to government people who insisted that someone tell them Precision based purely on the classifier accuracy without considering usage.

jameshart 1232 days ago

We can’t release the essay writing language model. Lazy children will use it to write their essays for them!

We can’t release the ai-generated text detection model. Lazy teachers will use it to falsely accuse children of cheating!

The problem here appears to be lazy people.

Can we train an AI to detect lazy people? I promise not to lazily rely on it without thinking.

screye 1232 days ago

Hilariously, this has already happened with music composition. Especially drumming.

Since the advent of drum machines, a lot of younger players have started playing with the sort of precision that drum machines enable. eg: The complete absence of swing, and clean high-tempo blasts/rides.

So you'd get accusations of drummers not being able to play their own songs, because traditional drummers think such technically complex and 'soulless' performances couldn't possibly be human. Only to then be proven wrong, when it turns out that younger players can in fact do it.

The machine conditions man.

e_i_pi_2 1232 days ago

I can't remember the keyword to look it up, but there's a problem of statistics you run into with stuff like terrorism detection algorithms

If we have 300M people in the US and only 1k terrorists, then you need 99.9999% accuracy before you start getting more true positives than false positives. If you use this in a classroom where no one is actually using AI you'll get false positives, and in a class where the usage is average you'll still get more false positives than true ones, which makes the test do more harm than good unless it's just a reason to look into it more - and the teacher is presumably already reading the text so if that doesn't help than this surely won't

xmddmx 1232 days ago

It's the False Positive Paradox: https://en.wikipedia.org/wiki/Base_rate_fallacy#False_positi...

Verdex 1232 days ago

I wonder if I should help my kids setup a server + webcam + screen capture tool so they can document 100% of their essay writing experience. That way if they ever get hit with a false positive they can just respond with hundreds of hours of video evidence that shows them as the unique author of every essay they've ever written.

anotherjesse 1232 days ago

You will certainly have a lot of training video to create a "essay writing video generator" ml product

causalmodels 1232 days ago

You could always teach them how to use git and have them commit frequently. Seems like it would be less intrusive than a webcam.

Verdex 1232 days ago

Source control would certainly help establish a history of incrementally performing school work by someone when viewed by a highly technical examiner and when periodically stored someplace where a trusted 3rd party can confirm it wasn't all generated the night after a supposed false positive.

However, hundreds of hours of video is compelling to non-technical audiences and even more importantly is a preponderance of evidence that's going to be particularly damning if played in front of a PTA meeting.

With a git history it's going to come down to who can spin the better story. The video is the story and everyone recognizes it, so I expect fewer people would bother even challenging its authenticity.

causalmodels 1232 days ago

I guess that's fair. I just personally don't think the additional gain is worth taking away your child's privacy.

Verdex 1232 days ago

It's only taking away their privacy if they're falsely accused.

And properly used you might not even have to relinquish privacy if falsely accused. A quick montage video demo and a promise to show the full hundreds of hours of video of "irrefutable" proof to embarrass the school district at the next PTA meeting might be sufficient to get the appropriate response.

tshaddox 1232 days ago

You could still cheat quite easily and inexpensively with an earpiece, as long as you know how to write down what you hear.

Verdex 1232 days ago

It's about building a narrative. Yeah, you could still cheat, but who would go through the effort of generating hundreds of hours of fake videos proving yourself innocent. For that amount of effort you might as well have done the work yourself.

Of course there are some people who put insane amounts of effort into not doing "real" work. However, anyone trying to prove that your child is in that position is going to find themselves in an uphill battle.

Which is the ultimate goal here. Make people realize that falsely accusing my children using dubious technology is going to be a lot more work than just giving up and leaving them alone.

saltysnowball 1232 days ago

This is already an issue, I'm a student in college right now and even technical professors are operating with full confidence in systems like turnitin which try their hand at plagiarism detection (with often much higher false negative/false positive rates). The problem was even more prevalent in high school where teachers would treat it as a 100% certainty. Thus, I think that OpenAI making atleast a slightly better classification algorithm won't make the state of affairs any worse.

dougmwne 1232 days ago

The cheating students who know how to use the classifier will be the big winners.

ibejoeb 1232 days ago

I think there is a more dystopian near future:

1. There will be commercial products to tune per-student writing models.

2. Those models will be used to evaluate progress and contribute directly to scores, grades, and rankings. They may also serve to detect collaboration.

3. The models persist indefinitely and will be sold to industry for all sorts of purposes, like hiring.

4. Thy will certainly be sold to the state for law enforcement and identity cataloging.

tshaddox 1232 days ago

It's almost as if you need to give exams in person and watch the students if you don't want them to cheat. This is fundamentally no different than cheating by writing notes on your hand in an exam or paying someone to write a take-home essay for you. It's cheaper than the latter, but that just means the lazy curriculum finally needs to be updated.

cjbgkagh 1232 days ago

> false positive rate of 9%

Yeah, that is useless. You couldn't punish based on that alone and students will quickly figure out to never confess.

janalsncm 1232 days ago

I urge anyone with time to write to tech journalists explaining why this is so bad. Given previous coverage of GPTZero they don’t seem to be asking the right questions.

sometimeshuman 1232 days ago

Sorry for the tangent but a surprising number the general public doesn't know the meaning of percent[1]. So even if a teacher is told those percentages many wouldn't know what to conclude.

[1] Me, giving young adults that worked for me a commission rate. Then asking if their commission rate is 15% and they sell $100 of goods what is their payment. Many failed to provide an answer.

tremon 1232 days ago

I dare hope for a less dystopian outcome:

- teachers will assign less mind-numbing essay homework assignments and focus more on oral interviews.

Fomite 1232 days ago

That heavily favors a particular learning style, which isn't necessarily a desirable outcome.

juve1996 1232 days ago

You can't make everyone happy.

Fomite 1232 days ago

Generally speaking, education (when done correctly) tries to avoid "...and devil take the hindmost" as a guiding philosophy.

juve1996 1232 days ago

Mass education is like mass transit. It gets the majority of the population somewhere. Not everyone gets to take the ferrari. Someone will always be left out and we shouldn't let perfect be the enemy of good enough.

tremon 1232 days ago

...but if we are forced to choose, it's better to spend our effort on the ones who don't own a Ferrari.

rvba 1232 days ago

I guess students will get recorded writing their homework, say on a tablet.

Then of course the AI can whisper the student what to write to your ear. So perhaps homework will have to be done at school? School that checks its students with a metal detector when they enter. (Some schools use them already to check for guns?)

On a side note Im very shocked how lax is everything in those proffessional chess tournaments. It feels there are many ways to cheat and they dont try to do anything against cheating. They should use metal detectctors (to detect computers inside stomach or tooth), they should host everything inside a bunker (so no radio), without audience (who can do various tricks) and in a secured environment (all cameras chcecked to be sure they are legit).

Those chess tours look like cheating galore for me, although I dont play chess.

thewataccount 1232 days ago

Hopefully they just flag relevant sections. Essay/Plagiarism checkers already exist, although in my experience professors were reasonable.

For example I had a paragraph or two get flagged as being very similar to another paper - but both papers were about a fairly niche topic (involving therapy animals) and we had both used the relevant quotes from the study conclusions from one of only a few decent sources at the time - so of course they were going to be very similar.

Given that most essays are about roughly the same set of topics, and there are literally hundreds of thousands of students writing these - I wonder how many variations are even possible for humans to write as I would expect us to converge on similar essays?

michaericalribo 1232 days ago

Plagiarism is easier to verify, because you can directly compare with the plagiarized source material

thewataccount 1232 days ago

Absolutely. I think it may have to end up more as a statistics thing with behaviour. For example:

"Tom had a single paragraph flag as possibly generated" vs "Every single paper Tom writes has paragraphs flag"

Basically we might have to move to detecting statistical outliers as cheating. Now whether the tools/teachers will understand/actually do that - we can only hope....

blueblimp 1232 days ago

That's a good point: the effectiveness at detecting AI generation is probably going to depend strongly on the length of the text.

ren_engineer 1232 days ago

>false positive rate of 9%

bringing the Roman decimation to the classroom based on AI, this is the future

deltree7 1232 days ago

Also, there will exist

Prompt => AIGen (White Hat) => Obfuscate(Black Hat) => Final Text

Fomite 1232 days ago

I think the much more proximate threat is that fear of ChatGPT kills a lot of progress that's been made in making exam material more accessible (take home tests, etc.) to a broader audience of students.

jupp0r 1232 days ago

This is worse than useless, if taking base rate fallacy into account.

dirtyid 1232 days ago

I imagining future that involves programs monitors students writing in proctored setting to establish some sort of individual finger print and use that to match against future writing assignments, again persistently monitored for authenticity. Clippy going to pop up in the corner to warn you when you've been behaving too artificial. Whatever that means.

nonrandomstring 1232 days ago

A more likely outcome is that teachers will pay the price [1].

[1] https://www.timeshighereducation.com/opinion/ai-will-replace...

(turn off js to jump signup-wall)

kilgnad 1232 days ago

This isn't that dystopian. The dystopian outcome is when there's a classifier that rates the quality of the text and that this classifier becomes indistinguishable from the AI-generated classifier because AI generated text is beginning to be superior to human generated text.

headsoup 1232 days ago

Ah but as with AI generally before now, 'you can't stop progress.' It'll end up being used and falling into an arms race of better AI vs better detection, all the while losing the point of why it is there at all in the first place.

la64710 1232 days ago

Exactly IMHO it is irresponsible to release such classifier with a title that touts the desired feature and totally do not spell its limitations. At least precede such title with experimental or something.

adamsmith143 1232 days ago

Or we realize that essays aren't that important and technical skills will become more highly valued. Either way, ChatGPT can't do your exams for you so the truth will come out anyway.

mitchdoogle 1232 days ago

Writing is very important for understanding a topic and long-term recall. I still remember topics from papers I did 15 years ago because I spent 10s of hours researching and writing and forming ideas about each topic.

Instead of being overzealous about catching cheaters, teachers should learn to express the importance of writing and why it is done. Convince the students that they should do it to be a smarter person, not just to get a grade, and they will care more about doing it honestly.

8note 1232 days ago

Writing is itself a technical skill

With ai taking over technical skills, it seems clear to me that they will be values less. Instead, the soft skills will be the valued ones

kmkemp 1232 days ago

Any solution here is just an arms race. The better AI's get at generating text, the more impossible the job of identifying if an AI was responsible for writing a given text sample.

e_i_pi_2 1232 days ago

You could even just set up a GAN to make the AI better at not being detected as something written by an AI, I don't see a good general solution to this, but I also see it as a non-issue - if students have better tools they should be able to use them, just like a calculator on a test - that's allowed on tests because you still need to understand the concepts to put it to use

mitchdoogle 1232 days ago

4. Parents sue schools 5. Admins eliminate all writing requirements

p-e-w 1232 days ago

Let me soothe your fear: This isn't a novel cheating technology, it's a technology that will make humans obsolete. Neither teachers nor students are going to matter in the future. Most or all of the population is going to be enslaved for all practical purposes, either to an all-powerful super-elite, or to AI itself. Any worries about how mundane things like education are going to be impacted by petty cat-and-mouse games are going to become irrelevant, because education itself is going to be irrelevant, along with everything else that once defined our world.

blueblimp 1232 days ago

> true positive rate of only 26% and a false positive rate of 9%

That's uninformative enough that I'm surprised they launched this publicly at all.

mr_toad 1232 days ago

Maybe they should start asking questions that AI can’t answer, instead of having students regurgitate what they’ve memorised.

flandish 1232 days ago

In the same way deepfake video should not be allowed as evidence, thereby ensuring no video is allowed… we can apply that to text as well.

We’re entering an uncanny valley before a period of “reset” with self taught (to stay on subject here) people re-learning for the sake of learning.

In 30 years we will be in an educational renaissance of people learning “like the old masters did in the 1900’s.”

EGreg 1232 days ago

Nah. In 30 years it will be as useless to learn most subjects as it is right now to learn crocheing and knitting, or learning times tables or using an abacus.

People are wayyyy too optimistic, just like in the 1900s they thought people would have flying cars but not the Internet, or how Star Trek’s android Data is so limited and lame.

Bots will be doing most of the work AND have the best lines to say, AND make the best arguments in court etc.

You don’t even need to look to AI for that. The best algorithms are simply uploaded to all the bots and they are able to do 800 things, in superhuman ways, and have access to the internet for whatever extra info they need.

When they swarm, they’ll easily outcompete any group of humans. For example they can enter this HN thread and overwhelm it with arguments.

No, the old masters were needed. Studying will not be. The Eloi and Morlocks is closer to what we can expect.

tokai 1232 days ago

Apparently knitwear is forecasted to have a CAGR of 12% the rest of the decade. With hand knitted garments commanding the high prices. It's definitely not the worst cottage industry one can chose.

EGreg 1232 days ago

https://xkcd.com/1102/

flandish 1232 days ago

As someone who’s known how to crochet and knit since he as 6… I disagree.

bilater 1232 days ago

yup - you can beat em using simple things like mixing up words to throw off the word distribution. GPT-Minus1 is an exmaple.

https://gptminus1.vercel.app/

amelius 1232 days ago

Solution: just write your texts with a bit less confidence than gpt3 would.

Kiro 1232 days ago

Funny how everyone praised GPTZero that has even worse rates but starts being skeptical when it's OpenAI, the new bad guy.

dns_snek 1232 days ago

"Everyone" didn't. In fact, the 5 top comments in that thread[1] all called it useless or pointed out serious flaws.

[1] https://news.ycombinator.com/item?id=34556681

anonobviously 1232 days ago

This is extremely concerning.

The co-author on this is includes Professor Scott Aaronson. Reading his blog Shtetl-Optimized and reading his [sad/unfortunate/debate-able/correct?/factual?/biased?] views on adverse/collateral harm to Palestinians civilians makes me question whether this model would fully consider collateral damage and harm to innocent civilians, whomever that subgroup might be. What if his model works well, except for some minority groups' languages which might reflect OpenAI speak? Does it matter if the model is 99.9% accurate if the 0.1% is always one particular minority group that has a specific dialect or phrasing style? Who monitors it? Who guards these guards?