Yup it's only going to get worse - at least for now, it's difficult for these models to generate long news articles that are coherent.
> mean human accuracy at detecting articles that were produced by the 175B parameter model was barely above chance at ∼52% [...] Human abilities to detect model generated text appear to decrease as model size increases [...] This is true despite the fact that participants spend more time on each output as model size increases [1]
> for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles [1]
> mean human accuracy at detecting articles that were produced by the 175B parameter model was barely above chance at ∼52% [...] Human abilities to detect model generated text appear to decrease as model size increases [...] This is true despite the fact that participants spend more time on each output as model size increases [1]
> for news articles that are around 500 words long, GPT-3 continues to produce articles that humans find difficult to distinguish from human written news articles [1]
[1] https://arxiv.org/pdf/2005.14165.pdf