Hacker News new | ask | show | jobs
by jackcviers3 1868 days ago
6 and 2 on hard mode. The failure of the model to connect ideas in long paragraphs (or to make a succinct claim) is what gives it away. It introduces far too many terms with far too little repetition and far too much specificity in such a short span.

Suggested tweak - train it against papers written by people with an Erdos number < 3 (or Feynman contributors, etc.), so that the topics and fake topics are more closely related in style and content. Maybe even feed it some of their professional letters as well. That would produce some very hard to decipher fakes.

Another great corpus for complex writing is public law books. Have it compare real laws from the training set with fake laws. I bet it would be very difficult to figure out the fake laws.

Training one of these on an entire corpus of one author (Roger Ebert, Justice Ginsberg, Joyce, anyone with a large enough body of work), and having people spot the fake paragraphs from the real ones would be very, very difficult. An entire text, however, would likely be discernible.

It is getting really, really close to being able to fool any layman, though. Impressive work!