Hacker News new | ask | show | jobs
by HeatrayEnjoyer 545 days ago
Discovering certain types of data were gathered and used would be much worse.

Training on CNN and Netflix content = i sleep

Training on private personal and corporate inboxes, medical records, and illegal content, purchased from blackhat data brokers = real shit

A Kenyan data labeler famously cut ties with Openai after Openai asked them to gather CSAM content.

1 comments

Citation on that?
Gather and label are two wildly different things that change the entire context. They aren't saying go find this stuff for us, they are saying if people upload it or you find it in the data then, label it as such.
It only changes who actually gathered the CSAM they asked this person to label. OpenAI definitely gathered it.