Hacker News new | ask | show | jobs
by foundart 545 days ago
Courts in the relevant jurisdictions don't work on "no one really disputes."

It would have to be _proven_ in a court, which involves evidence and testimony, and if the whistleblower was in a good position to provide credible testimony then his death would likely make it harder to do prove copyright violations have taken place.

2 comments

I'm pretty sure any competent lawyer would stipulate that, in many/most cases, training is happening on copyrighted information. I'm also pretty sure that OpenAI is not arguing that all their training data is either licensed or they own the copyrights to. (Some companies, perhaps Adobe?, have been more conservative.) Perhaps I'm wrong. But I haven't heard that argument publicly and I would need to be convinced.
Discovering certain types of data were gathered and used would be much worse.

Training on CNN and Netflix content = i sleep

Training on private personal and corporate inboxes, medical records, and illegal content, purchased from blackhat data brokers = real shit

A Kenyan data labeler famously cut ties with Openai after Openai asked them to gather CSAM content.

Citation on that?
Gather and label are two wildly different things that change the entire context. They aren't saying go find this stuff for us, they are saying if people upload it or you find it in the data then, label it as such.
It only changes who actually gathered the CSAM they asked this person to label. OpenAI definitely gathered it.
Courts in the relevant jurisdictions don't work on "no one really disputes."

It’s called a Motion for Summary Judgment.