| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by evolutionas 2566 days ago
	If this is the state-of-art then I have some bad news. I recently finished writing my thesis and copy-pasted several paragraphs that did not included any mathematical formulas. Most of them were classified as written by a machine (8 out of 12). It might be due that I am not a native speaker. It seems that the model struggled the most where there are mathematical details discussed but on the topics where I wrote more freely as conclusions and some analysis, it classified as written by human.

2 comments

schmmd 2566 days ago

Thanks for sharing! If your content (such as a thesis) is out of domain (not a news article) all bets are off on how the model will perform.

link

hamburglar1 2566 days ago

Isn't that an issue? for instance if someone made a video 'out of your domain' (e.g. different model than the interal training example) how would the model perform? Would the AUC be impacted? what is the PPV? It seems common in these results that people are experiencing false positives, i did as well. if the percentage of fake news that we read is 10% and the model (auc + operating point on a test set unpublished) has 92% sens and spec we would still expect that ~50% of model positives are true negatives. If the "accuracy" is computed in an unblanaced dataset, what is to be taken from it ?

link

sieabahlpark 2565 days ago

What happens is it essentially collapses as it requires a set of people to train the model. Meaning that set of people with their biases are training an AI to determine what is fake and what isn't.

Sounds like a pretty bad idea especially if they decide to be gatekeepers of factual articles. It requires the entire team to know their biases one way or another. Regardless if they think it's "right" or not.

link

tasubotadas 2566 days ago

Same here - articles that I've written (not a native speaker) before were classified as written by a machine :-D.

Edit: Nevertheless, it is still an amazing piece of work. The quality of the generated text is astounding.

link