| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by paulgb 1093 days ago
	> This has been a problem for years. Hah, yep. I used to build tools to detect it at an adtech startup. A common approach at the time was to take someone else's text and use naive thesaurus replacement so that it would be just barely comprehensible but statistically look like english. So “You can catch the mouse” might become “you jar trap the rodent”. Glad to see how far technology has progressed! /s

3 comments

duskwuff 1093 days ago

And, unfortunately, it's still an ongoing problem. Some of it even ends up getting published by IEEE, e.g. [1].

There was some research a few years ago [2] into just how widespread this issue was in scientific publishing. The situation has likely only gotten worse with the introduction of higher-quality text generation LLMs.

[1]: https://ieeexplore.ieee.org/document/8597261

[2]: https://arxiv.org/abs/2107.06751

link

TechBro8615 1093 days ago

I've noticed this sometimes where it replaces proper nouns with a synonym, e.g. "Bill Gates" becomes "Invoice Gates." Unfortunately that pattern only applies to the most bottom-of-the-barrel SEO spam. I expect ChatGPT output will be more subtle, but if a lazy spammer doesn't obfuscate it enough, there will still be some tells - e.g. the five paragraph essay format with a conclusion beginning with "overall..."

link

vintermann 1093 days ago

Language models have seen enough of this genre of gibberish already that they can imitate it flawlessly.

https://twitter.com/janellecshane/status/1280915484351754240

link

stef25 1093 days ago

Amazon here in Europe is overrun with those kind of wrong translations, input by product sellers that just run everything through google translate. Still amazing that a player as big as Amazon accepts all this content, which is just 1 step above gibberish.

link

seanp2k2 1093 days ago

yay the arms race between ROI and computational power required to generate spam vs ROI and computational power required to tell if something is indeed spam, the inescapable cat-and-mouse game between users just trying to find something that isn't garbage while a million hawksters attempt to dupe them into buying their garbage products

link