Hacker News new | ask | show | jobs
by spxneo 811 days ago
so you think scraping copyrighted content to sell ads is okay and downloading copyrighted games for free is also okay then why is it not okay for ChatGPT to train itself on scraped content?
2 comments

It's not scraping, it's indexing and linking out to creators. LLMs are helping themselves to everything with no regard for content creators. They should be subject to copyright claims — I don't care if it destroys their business, they should've considered that at the outset. They didn't then and they don't care to now, they're simply greedy and looking to build something that benefits themselves and their investors with no regard for anyone they step on to do so.
but how can you prove that your picture of a cat was used in LLM?

if you owned a franchise called "Chicken Brothers" with a the logo of two chickens standing side by side with arms crossed proudly then do you have claim over all derivatives including the spanish name generated by LLM?

i just dont think its straight forward, the main complaint should be payout for license used during training but its tough to prove unless someone at OpenAI dumps the AWS cloudwatch logs

That's OpenAI's problem and the burden should be on them.
The first part is fine because the search engine blurb isn’t a replacement for the thing itself. And I disagree with what ROM sites claim, you can’t just dump ROMs online and claim it’s not copyright infringement