| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by null_point 908 days ago
	I suspect this may delay some short term progress by creating pressure on AI labs to train their models from data curated or synthesized in a way that is contentious of copyright law. There is already troves of data that are fair game for training, but even "corrupted" data sets can probably be used if used intelligently. We've already seen examples of new models effectively being trained off of GPT-4. That approach with filters for copyrighted material might allow for data that is sufficiently "scrambled". Not to say building such a filter is definitely easy, but seems plausible.