|
|
|
|
|
by paulgb
1093 days ago
|
|
> This has been a problem for years. Hah, yep. I used to build tools to detect it at an adtech startup. A common approach at the time was to take someone else's text and use naive thesaurus replacement so that it would be just barely comprehensible but statistically look like english. So “You can catch the mouse” might become “you jar trap the rodent”. Glad to see how far technology has progressed! /s |
|
There was some research a few years ago [2] into just how widespread this issue was in scientific publishing. The situation has likely only gotten worse with the introduction of higher-quality text generation LLMs.
[1]: https://ieeexplore.ieee.org/document/8597261
[2]: https://arxiv.org/abs/2107.06751