| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by JieJie 1208 days ago
	Then we should encourage labeled ChatGPT content like ShareGPT, which can be easily avoided in future datasets because it is clearly labeled as AI-generated content. It's the stuff that isn't labeled as generated with ChatGPT, et al, that will enter future training sets. I personally believe that's taking the "lossy JPEG" analogy too far, but I'm not an AI researcher.