| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by albert_e 388 days ago
	We are 2.5 years into a world with access to ChatGPT and tons of data being produced and published with help of LLMs. How are these labs filtering out AI generated content from entering the training data. (Other than of course synthetic data deliberately generated by themselves for purpose of training.)