Hacker News new | ask | show | jobs
by 0xblood 1254 days ago
Almost more interesting in that article I found that Sama, the Kenyan company, was also asked to collect CP and brutal/gore images for OpenAI. After delivering 1400 images, Sama cancelled the contract as this was even too much for them. OpenAI then talked about "miscommunication" and they actually did not really want those CP images. Well, they made its own category for it and asked Sama to collect images of several other categories, but somehow it was lost in communication that for one of theses categories, no images should be collected. Because they are illegal. OpenAI swears they never opened the images they received (and paid for).
5 comments

Is there a source for this claim? This would be life-ending for OpenAI if true.

edit: This is mentioned in the original article. From TFA:

>Sama delivered OpenAI a sample batch of 1,400 images. Some of those images were categorized as “C4”—OpenAI’s internal label denoting child sexual abuse—according to the document. Also included in the batch were “C3” images (including bestiality, rape, and sexual slavery,) and “V3” images depicting graphic detail of death, violence or serious physical injury, according to the billing document. OpenAI paid Sama a total of $787.50 for collecting the images, the document shows.

> Within weeks, Sama had canceled all its work for OpenAI—eight months earlier than agreed in the contracts. The outsourcing company said in a statement that its agreement to collect images for OpenAI did not include any reference to illegal content, and it was only after the work had begun that OpenAI sent “additional instructions” referring to “some illegal categories.” “The East Africa team raised concerns to our executives right away. Sama immediately ended the image classification pilot and gave notice that we would cancel all remaining [projects] with OpenAI,” a Sama spokesperson said. “The individuals working with the client did not vet the request through the proper channels. After a review of the situation, individuals were terminated and new sales vetting policies and guardrails were put in place.”

> Well, they made its own category for it and asked Sama to collect images of several other categories, but somehow it was lost in communication that for one of theses categories, no images should be collected.

Be careful here. Read the article closely, here is what it says:

> [Sama] said in a statement that its agreement to collect images for OpenAI did not include any reference to illegal content, and it was only after the work had begun that OpenAI sent “additional instructions” referring to “some illegal categories.”

Note how carefully this is worded - if Time could confidently say that OpenAI asked for C4 images, they would have absolutely put that in the article. Now, this is filtered through PR statements, but it reads to me like a poorly-worded email went out from OpenAI that didn't actually ask for C4 images, but one of Sama's employees interpreted it as an ask, and started collecting without raising any red flags up the chain. And they got fired for it.

If your interpretation were true, it would be hard to understand why Sama also cancelled the entire deal with OpenAI. It's much more likely that a (possibly rogue) employee of OpenAI asked an employee of Sama for those images explicitly as part of additional work on the existing contract. The Sama employee agreed, but when the hire ups found out, they fired them and cancelled the whole deal, since they were not comfortable handling this material (whether for legal reasons, moral reasons, or both is of course unknowable).
Plausible deniability. "Rogue employee" is what they always use when doing damage control.
I think you're reading too much into this. The natural reading is quite clearly that the illegal categories referred to were or included "C4", and it'd be highly unethical for them to have framed the paragraph in that manner if they believed that not to be the case. It's worth noting that OpenAI's PR statement only goes so far as to call the situation a miscommunication, and doesn't directly assign blame to Sama, while Sama explicitly claims OpenAI asked for illegal categories in subsequent instructions.

Also, I'm not up on the laws regarding this stuff, but are the other, awful categories illegal to collect? If not, there's not much room for ambiguity.

“if Time could confidently say that OpenAI asked for C4 images, they would have absolutely put that in the article.”

All your word salad ignores this reality.

Directing someone to commit a crime is still a crime. OpenAI most likely has criminal liability in this instance and the FBI should open an investigation if they haven't already.
Crime is not the appropriate word but there's clearly a well documented history of abusing conditions toward data labeling workers [1] [2] [3]

For those looking for an AI data labeling service also tries to help workers along the way, here's a plug to the company I started "dataprep.be" [4]

We have a small preference for working workers with special needs (deaf, mute and employees with small handicaps). Public subsidies for these type of workers help our case in the EU as well as contraints for some public institutions in the EU to hire more handicap workers.

With clients more sensitive to costs, we also work with remote data labelers from developing countries. We help putting checks in place to limit forced and child labor. We pay 5% extra so they have time to learn high demand tech skills. (Using Khan Academy and free access to a normally 250$/year Datacamp subscription)

Happy to work with the HN crowd or just receive feedback and mentoring! (My email is in my profile)

[1] https://www.vice.com/en/article/88apnv/underpaid-workers-are... [2] https://gizmodo.com/horror-stories-from-inside-amazons-mecha... [3] https://www.thebureauinvestigates.com/stories/2022-10-20/beh... [4] https://www.dataprep.be

The fact that you refer to workers in "developing countries" as being leveraged for clients "more sensitive to costs" should tip you off that what you're doing is exploitative and dehumanising.
Criminal liability for someone misunderstanding a request and doing what they were not asked to do?

Where did you earn your law degree?

You have it backwards; OpenAI was the one that made the request. OpenAI claimed that the Kenyan company misunderstood the request. Your parent comment is claiming that OpenAI is criminally liable for making the request.
damn, even if you get big, death is still there
Somewhat shockingly her site (from TC link) has been taken over by domain squatter with gambling links.

Surely 3 years is too little time to be forgotten.

I suppose none of us gets to go to Kislovodsk ..

https://www.youtube.com/watch?v=XrAGf2EMAbs

where did you find this ?