| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by cma 1205 days ago
	I think there's probably nothing wrong with training on others' ChatGPT transcripts posted on the open web. OpenAI trains on source-available projects with non-commercial terms, so their lawyers have already been over a similar case and decided it should be fine.

1 comments

ralfn 1205 days ago

Not just that: Imagine OpenAI going to court and establishing the legal precedent that makes their own product illegal.

So OpenAI can claim whatever they like, there is no way they will ever pursue legal actions, unless their intent is to (intentionally) lose the court case to establish the precedent that it is okay to train on random data you scraped from the internet.

We would also get into a weird situation anyhow where it is hard/impossible to prove whether all/some/none of the information in a dataset is curated by humans. So in the worst case, we will have companies work with human curators (but secretly supplement with gray sourced materials) during their training. Just like how its hard to get 100% slave free coffee beans or cacao.

link

anentropic 1204 days ago

I don't think it's about things being illegal per se

But that they can sue you because, by making a competing product with data obtained by using their product, you contravened their terms & conditions for using their product

link

ralfn 1204 days ago

But so did they when they scraped the web for content.

That's not within anyone's terms and conditions except Wikipedia.

That's what I mean with precedent. If OpenAI would win that they would be sued in term by Bloomberg for example.

link