| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by foundart 545 days ago
	Courts in the relevant jurisdictions don't work on "no one really disputes." It would have to be _proven_ in a court, which involves evidence and testimony, and if the whistleblower was in a good position to provide credible testimony then his death would likely make it harder to do prove copyright violations have taken place.

2 comments

ghaff 545 days ago

I'm pretty sure any competent lawyer would stipulate that, in many/most cases, training is happening on copyrighted information. I'm also pretty sure that OpenAI is not arguing that all their training data is either licensed or they own the copyrights to. (Some companies, perhaps Adobe?, have been more conservative.) Perhaps I'm wrong. But I haven't heard that argument publicly and I would need to be convinced.

link

HeatrayEnjoyer 545 days ago

Discovering certain types of data were gathered and used would be much worse.

Training on CNN and Netflix content = i sleep

Training on private personal and corporate inboxes, medical records, and illegal content, purchased from blackhat data brokers = real shit

A Kenyan data labeler famously cut ties with Openai after Openai asked them to gather CSAM content.

link

BadHumans 545 days ago

Citation on that?

link

upghost 545 days ago

https://www.wsj.com/articles/chatgpt-openai-content-abusive-...

https://www.bigdatawire.com/2023/01/20/openai-outsourced-dat...

https://www.theguardian.com/technology/2023/aug/02/ai-chatbo...

https://www.businessinsider.com/openai-kenyan-contract-worke...

https://www.medianama.com/2023/07/223-kenyan-workers-call-fo...

They were asked to label CSAM, to clarify.

link

BadHumans 544 days ago

Gather and label are two wildly different things that change the entire context. They aren't saying go find this stuff for us, they are saying if people upload it or you find it in the data then, label it as such.

link

hansvm 544 days ago

It only changes who actually gathered the CSAM they asked this person to label. OpenAI definitely gathered it.

link

tiahura 545 days ago

Courts in the relevant jurisdictions don't work on "no one really disputes."

It’s called a Motion for Summary Judgment.

link