Hacker News new | ask | show | jobs
by SlinkyOnStairs 49 days ago
> How else do you want companies to remove and prevent CSAM?

Different situation.

Facebook has to do CSAM moderation because it's a publishing platform. People will post CSAM on facebook, so they must do moderation.

And "just don't have facebook" isn't a solution because every publication of any sort has to deal with this problem; Any newspaper accepting mail has this problem. (Albeit to a much more scaled down version) People were nailing obscene things to bulletin boards for all recorded history.

---

In contrast, OpenAI has no such problem. It did not have CSAM pushed onto it, it actively collected such data itself. It could have, at any point before and after, simply stopped scraping all of the web indiscriminately and switched to using more curated sources of scraped data.

The downside would be "worse LLMs" or "LLMs being created later", which is a perfectly acceptable compromise.

---

This is not to say that genuine content flagging firms have no reason to curate such data & build tools to automatically flag content before human moderators have to. (But then they also shouldn't be outsourcing this and traumatizing contract workers for $2-3 an hour)

But OpenAI is not such a firm. It's a general AI company.

6 comments

> traumatizing contract workers for $2-3 an hour)

Is there an hourly rate at which this should be acceptable?

There's no dollar amount but proper support during and after employment is a minimum, and a large paycheque will both offset some of the human cost and make it easier for people to be pushed to quit the job; Such that they aren't doing the job for too long.

The current support systems for police in this subject are already insufficient. Facebook's treatment of their moderation staff is abhorrent. The point of including the pay figure is to further illustrate just how damning this subcontracting practice is.

There is labor that is necessary for our societies to function, but a direct threat to the people doing the work. Someone has to do it, and it should be seen as a great service to society and rewarded accordingly. In a just world, we would be paying significantly extra for threats to health that come from work, in the one we are currently in we use threat of worse harm instead.
Someone has to do it, and it should be seen as a great service to society and rewarded accordingly

You are just too priviledge to understand people: many people would be glad do do it for the minimum wage, I would fight to have that oportunity (I live in west EU).

We have coal miners destroying their bodies and lungs, cobalt mining slavery, cocoa nut child labour and de facto slavery, sex workers, CPS investigators, first responders, and doctors with high rates of suicide…

Not only is there an acceptable market rate for trauma, it’s sometimes competitive and requires licensing.

There is one difference between first responders/doctors and the other classes (and the moderators under discussion here)

First responders/doctors/CPS investigators see the worst but they also have days where they make a difference. Save a life or multiple lives. I'm sure it's a huge part of what makes the job bearable, and to some meaningful.

I'm not discounting your point about high rates of suicide either. If anything, when you take away any good days, you're left, as a content moderator, with just seeing the worst of the world day in, day out, with nothing to make it meaningful. I'd suggest that's something we as a society should not tolerate as being an acceptable trade for the ability to share cat photos.

>First responders/doctors/CPS investigators see the worst but they also have days where they make a difference. Save a life or multiple lives. I'm sure it's a huge part of what makes the job bearable, and to some meaningful.

You think miners don't make a difference or save lives?

> You think miners don't make a difference or save lives?

Do you think miners mining is saving lives in the same way that doctors saving lives is saving lives?

To continue the parents point, do you think miners derive a deep or powerful satisfaction from some of their mining work which might offer some of the heavy cost it has on them physically and emotionally?

I think miners save more lives (through the supply of gas, energy, battery materials, pesticides, fertilizers, solar panel minerals, and ultimately electricity, computing materials, etc) than doctors do.

And I think what prevents miners to "derive a deep or powerful satisfaction from some of their mining work which might offset some of the heavy cost it has on them physically and emotionally" is not anything inherent in their work, but people thinking that only direct affect should be prestigious and satisfying and underapreciating the thankless background work to keep the lights on.

Same way people sneer at cleaning people or teachers and their meagre salaries and no respect, or domestic labor.

Emergency Department^ doctors, what do they make? give people who have to review the worst humanity has to offer and pay them that. and while we're at it, ambulance personnel should get a huge pay bump. Take it from nurses' pay.

^ i originally said "triage doctors" but i meant the resident ER doc.

Why take from other workers when it can be siphoned from upper management and shareholders?
you're right, it's a personal failing that i must snip at nurses whenever the word appears in my head. Apologies.
ER triage is usually done by a nurse, at least in England.
Rookie police officers in my country are paid 2500 euro per month and they have to deal with the underbelly of society.

They have access to better counselling and are ostensibly trained for the job. But there are still suicides.

OpenAI runs ChatGPT where users submit text and photos and OpenAI generates and sends text and photos back. So users could be submitting CSAM. And yes, OpenAI could be generating CSAM. It's not limited to being a pull operation. What am I missing?
What you're missing is that they're "separate" parts of the business.

The core Facebook product is users' posts. It's not possible to separate those two. Nor can one downscale Facebook in a way that stops the problem; The aforementioned "Facebook has had this problem because it's a problem we've had since the medieval days of a town bulletin board"

With OpenAI, the way ChatGPT was built and user submissions are separate things. The GPT models could have been have been trained without this mess. OpenAI could be more selective in what data it scrapes.

While OpenAI cannot stop users sending god knows what in their prompt text and images, OpenAI can choose to not interact with that data beyond the minimum legal retention, by e.g. not using it for training the next generation of models. This would massively downscale the problem.

AI output is another such problem, where A) Maybe this'd be less of a problem if they didn't recklessly include a bunch of CSAM into the training data by accident, and B) LLMs just aren't the kind of fundamental human right that "having a public opinion" is. It would be fine if they were less good, invented years later, or even not invented at all.

The main counterargument to the latter has been the "But China is inventing evil AI" spiel, which is fairly weak. If China builds an orphaned baby crushing machine, we do not need to build an orphaned baby crushing machine of our own. (And the reality is that China is only chasing AI so aggressively because the west does. They're reasonable people, it would have been entirely possible for both the west and China to make a mutual "no orphan crushing" agreement and just accept slower rollout of technology. This is exactly what has been done with human genetic engineering, and China did in fact enforce these norms.)

People upload images to openai and have it generate and modify. And it has to not generate csam.

I guess that they process billions of images every day.

I don’t think they’re getting csam from scraping (thankfully, I expect there isnt much publicly available csam).

They aren’t as big as facebook, but they must have this functionality or many users will be hurt.

> In contrast, OpenAI has no such problem. It did not have CSAM pushed onto it, it actively collected such data itself. It could have, at any point before and after, simply stopped scraping all of the web indiscriminately and switched to using more curated sources of scraped data.

You've just thrown the garbage over your fence. Instead of OpenAI contracting Sama to classify CSAM, the "Curators" have to.

At the end of the day, someone needs to classify it. If you say the platforms need to, and they miss some, and it ends up in OAI training data, OAI is going to be the entity paying the prices.

Not really different. They would need to report CSAM if it is ever uploaded by a user.

Any website that allows user to upload videos needs some sort of service that can identify and report CSAM.

> In contrast, OpenAI has no such problem. It did not have CSAM pushed onto it, it actively collected such data itself. It could have, at any point before and after, simply stopped scraping all of the web indiscriminately and switched to using more curated sources of scraped data.

This is of course incredibly illegal, but megacorps (by valuation) and oligarchy members are above the law so who cares. I assume there could be a regulatory framework which can make this legal for an extremely specific purpose, but there is zero change that OpenAI was part of this/abiding by this in 2022, absolutely none.