Hacker News new | ask | show | jobs
by rnosov 1208 days ago
Correct output will be desirable. If you feed nonsense either human or AI generated you might break it.
1 comments

Then we should encourage labeled ChatGPT content like ShareGPT, which can be easily avoided in future datasets because it is clearly labeled as AI-generated content.

It's the stuff that isn't labeled as generated with ChatGPT, et al, that will enter future training sets. I personally believe that's taking the "lossy JPEG" analogy too far, but I'm not an AI researcher.