| HN Mirror

Yup it's only going to get worse - at least for now, it's difficult for these models to generate long news articles that are coherent.

> mean human accuracy at detecting articles that were produced by the 175B parameter model was barely above chance at ∼52% [...] Human abilities to detect model generated text appear to decrease as model size increases [...] This is true despite the fact that participants spend more time on each output as model size increases [1]

> for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles [1]

[1] https://arxiv.org/pdf/2005.14165.pdf