Hacker News new | ask | show | jobs
by simonw 1132 days ago
On the one hand, this is sorely needed: AI detection software will inevitably be mostly snake oil.

Academia and education desperately wants this software to work! As a result, selling them something that doesn't work is going to be very profitable.

The most obvious problem with this class of software is how easy it would be to defeat if the students could access it themselves: generate some text, run it through the detector, then fiddle with it (by manually tweaking it or by prompting the AI to "reword this to be less perfect") until it passes.

Which means these tools need to not be openly available... which makes them much harder to honestly test and evaluate, making it even easier to sell something that doesn't actually work.

But... I don't think this site is particularly convincing right now. It has spelling mistakes (which at least help demonstrate AI probably didn't write it) and the key "How AI Detection Software Works" page has a "Coming Soon" notice.

The "examples" page is pretty unconvincing right now too - and that's the page I expect to get the most attention: https://itwasntai.com/examples

It looks to me like this is still very much under development, and is not yet ready for wider distribution.

5 comments

this is exactly why i was stunned that someone apparently gave 3m to GPT-zero despite all its known flaws https://www.forbes.com/sites/rashishrivastava/2023/05/09/wit...

its too easy to be negative about things in hype cycles and retroactively look back and go "see! i was right! this was a terrible idea!" but.. this is a terrible idea

to ai detection fans: show us on an information theory basis how you will smuggle in enough bits, avoiding user obfuscation, please. i will change my mind and support you the moment you prove this can be done, otherwise i am default extremely skeptical

Such a bad choice of name too. Ripe for litigation.
> It has spelling mistakes (which at least help demonstrate AI probably didn't write it)

Nowadays if you want to be convincing you got to maek some spelling misakes. Something that looks like predictive keyboard errors, or typing errors.

Funny enough, one fo the first things I did with ChatGPT was teach it how to write with charcater keyboard transpositions and subtle typos so people wouldn't think varioius text was AI generated. Works pretty well.

(Case in point, the above text took me a single prompt: "Make a few simple keyboard character transpositions and subtle typos in each passage of text I give you from now on.")

Still, I doubt an LLM would ever start a sentence with "Funny enough" (the normal usage is "Funnily enough" but even that doesn't seem like something the current crop of AIs would use without explicit prompting).
Nice catch and very good point. The model actually tried to fix this in a couple cases, but not all, when I was playing with it to see how else it would typo.
> AI detection software will inevitably be mostly snake oil.

Probably so. The problem, of course, is that the inability to detect AI authorship leads to the increase of general distrust of everything in society.

That ship sailed with social media unfortunately.
Wait... Are you saying that I shouldn't be eating tide pods?
But why should educational instutions care? Education is a business, students are the customers (or in countries with state-funded education, the government). If AI helps people graduate faster, that's more money to the institutions, less effort to the students, and nice statistics for the governments.

At least in my country most degrees aren't worth much anyway, they just open you doors to internships where you really learn stuff. AI isn't going to make the situation any worse.

Because educational institutions aren’t in the business of selling education: they give that away for free. Seriously, walk into any university campus and into a lecture hall, sit down and take notes. No one will stop you! No one will check ID. You can even talk to the professor and 99% of the time they’ll give you access to the course materials online at well.

What students are paying for is accreditation. It’s not just their name that goes on the piece of paper, it’s the school’s name. Cheating undermines that business entirely. If a school looks the other way long enough there will be cheating scandals in the news and the school’s reputation will be damaged.

> Because educational institutions aren’t in the business of selling education: they give that away for free. Seriously, walk into any university campus and into a lecture hall, sit down and take notes. No one will stop you! No one will check ID. You can even talk to the professor and 99% of the time they’ll give you access to the course materials online at well.

Can confirm. When I was a senior in high school, a professor at Caltech even sponsored me as a visiting faculty member so I could check out books from the university libraries. No one in the administration even blinked an eye.

I ended up auditing several graduate aerospace classes like Ae105 & Ae121 and even worked on the AAReSt [1] thermal systems group project with several other graduate students who seemed to tolerate me most of the time. I still carry the ID around in my wallet as a keepsake.

[1] https://www.pellegrino.caltech.edu/aarest1

When I was in college, I was dating a girl who was taking a philosophy class that was particularly interesting to me. One day we were hanging out before her class and she was telling me about their discussions, and it covered some ideas I really enjoyed discussing. Since I was in computer science, I never got the opportunity to take a lot of humanities like Philosophy and so I mentioned that I wished I could take this class. She had to go to class and just said "you can come with me if you want".

I went with her and just sat down. I took notes, I participated in discussion, and ended up going back for several weeks. I eventually stopped going as they moved onto another chapter. I popped back in a few weeks later and the only thing the teacher said when they saw me come into the lecture hall was "hey it's good to see you again". The prof knew I existed but either didn't realize or didn't care that I wasn't actually on the class roll.

That's fine though, and it's exactly how we would want it to be aka it's for the students but if someone is ernest they can just 'participate' in some thing. That's a positive outcome I can't fathom getting upset by that.

That said, showing up for a class isn't exactly 'an education' either.

True most of the time, but interestingly enough, this is not the case in China (even pre-COVID). The Tsinghua University gate had serious guards - you absolutely could not proceed onto campus without the proper authorization.
Believe it or not, a lot of educational institutions really do care about actually educating their students and not just being diploma mills.
The fact that every job on the planet requires a bachelors degree now has done a real dis-service to the entire education system. It spawned a whole host of new institutions that only want to pump out degrees that cross the absolute minimum threshold for accreditation.

The consequence of these diploma mills are that they are now competitors to normal universities and have caused other universities to dilute their requirements and courses in order to compete against the diploma mills. In the end, we have regressed to the lowest common denominator, making the Bachelor Degree barely more respected than a high school diploma.

People getting into AI today follow courses like:

LLM University - https://docs.cohere.com/docs/llmu

and never get to learn about linear regression, bias and variance, cost function and gradient descent, regularisation and optimisation - all the good things taught by Andrew Ng in the amazing course he run 12 years ago just before creating Coursera.

Is that a good thing?

I think people getting into AI today instead ask ChatGPT and similar models questions like:

> "What field of modern science relies heavily on "linear regression, bias and variance, cost function and gradient descent, regularisation and optimisation"?"

To dive into a particular topic:

> "Provide a course outline for a four week course, meeting twice a week, that focuses on linear regression in the context of machine learning and the relationship between inputs and outputs."

And to get to the actual material, zoom in some more:

> "Please expand Session 2: Simple Linear regression into an hour-long talk focused on Python coding approaches to the problem"

And again, to get some working code:

> "For topic #2, please provide an explicit code example of using numpy, pandas and scikit-learn to load a dataset, preprocess the data, and split it into training and testing sets"

Anyone can generate a course on any topic using this approach, with pretty good results.

With the rate of hallucination, learning via ChatGPT is questionable at best, especially when someone doesn't know enough to know when it is hallucinating.
Learning about those things isn't particularly relevant to learning how to use LLMs for NLP tasks.

Not saying they're not worth learning, but I think it's reasonable for them not to be included in the syllabus for that particular course.

Kind of like how learning memory management in C doesn't need to be a pre-requisite for a course on Python.

Note that learning about those things will likely make you a better LLM+NLP practitioner, in the same way that having a good grasp of memory management in C will help you be more effective at working with Python - but it's OK to leave them out of introductory courses.

Governments paying for education don't just want graduations, they want an educated workforce, because there are benefits from having that.

Instructors generally do not treat education like a business. On some level the institutions themselves often are business-like, but on the classroom level I don't think that's the case.

I'm an adjunct professor (I still work full-time in the engineering field, but teach part-time) and I can tell you that ~75% of the professors don't want to teach as much as they want to do research. Most of them are only in academia to do research, but they are required to teach a certain number of classes.

At least at our University, it is mostly a thinktank. We publish research and attend symposiums for research and are mostly motivated by the research. The teaching is a byproduct.

This is probably not the case at community colleges and smaller colleges that are mostly pumping out degrees. But large universities are mostly motivated by research and getting published. That is largely what motivates high quality professors to work there.

The "educted workforce" the government wants is not for liberal arts essays. The government wants technical training like nursing.
They care because this is massively disruptive to the way they teach at the moment. They have decades of practices in place for how they evaluate students which don't work if students can have AI do the work for them.

They can chose to reinvent everything about how they operate, or they can pay money to a company that promises to make that problem go away for them.

It's not surprising that many of them are trying the latter option first.

Obligatory XKCD: https://xkcd.com/810/