Hacker News new | ask | show | jobs
by evolutionas 2566 days ago
If this is the state-of-art then I have some bad news. I recently finished writing my thesis and copy-pasted several paragraphs that did not included any mathematical formulas. Most of them were classified as written by a machine (8 out of 12). It might be due that I am not a native speaker. It seems that the model struggled the most where there are mathematical details discussed but on the topics where I wrote more freely as conclusions and some analysis, it classified as written by human.
2 comments

Thanks for sharing! If your content (such as a thesis) is out of domain (not a news article) all bets are off on how the model will perform.
Isn't that an issue? for instance if someone made a video 'out of your domain' (e.g. different model than the interal training example) how would the model perform? Would the AUC be impacted? what is the PPV? It seems common in these results that people are experiencing false positives, i did as well. if the percentage of fake news that we read is 10% and the model (auc + operating point on a test set unpublished) has 92% sens and spec we would still expect that ~50% of model positives are true negatives. If the "accuracy" is computed in an unblanaced dataset, what is to be taken from it ?
What happens is it essentially collapses as it requires a set of people to train the model. Meaning that set of people with their biases are training an AI to determine what is fake and what isn't.

Sounds like a pretty bad idea especially if they decide to be gatekeepers of factual articles. It requires the entire team to know their biases one way or another. Regardless if they think it's "right" or not.

Same here - articles that I've written (not a native speaker) before were classified as written by a machine :-D.

Edit: Nevertheless, it is still an amazing piece of work. The quality of the generated text is astounding.