| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by esha_manideep 805 days ago
	Pretty amazing to see training data being discussed more openly

1 comments

WiSaGaN 805 days ago

Indeed. I think part of the reason when they are not discussed openly may be that much of the data used is copyrighted, which introduces some legal ambiguities.

link

YetAnotherNick 805 days ago

IANAL but hiding something doesn't make someone legally immune. Any company could sue LLM companies and they can't hide it during the case. e.g. there is already a similar case on OpenAI.

link

fl0id 805 days ago

Yes, but it at the very least delays any findings while you rake in the cash and try to create a favorable environment. OpenAI even stated that think using copyrighted texts is necessary and should be covered by fair use.

link