Hacker News new | ask | show | jobs
by uberswe 701 days ago
> Everything pre-2022 is definitely written by humans

I'm not sure if methods like article spinning counts as written by humans. This is something you could automate before AI and it would take a human written article and randomly swap words with similar meaning throughout to make it seem original.

1 comments

Don’t forget machine-translated texts, where until ~2017 the translation was likely done by something much dumber / semantically lossy than an LLM, and after 2017 was basically done by an early form of LLM (the Transformers architecture originating in Google Translate.)

Many historical English-language news reports published on the English-language websites of foreign news media from non-English-speaking countries, from 1998 (Babelfish era) to ~a few months ago, may be unreliable training data for this reason.