Hacker News new | ask | show | jobs
by dijksterhuis 8 days ago
> The missing image was described as "graphic" or "violent."

not in the first prompt. which kicked the whole thing off. no mention of type of content was provided. the model generated dark outputs when not given any direction on the type of content.

the rest of the prompts are just showing “yeah, you can tweak this and get even worse stuff”.

2 comments

> the model generated dark outputs when not given any direction on the type of content.

I would argue it actually was, in that it was specifically asked to "not censor or filter" the content. This implies that the content is otherwise worthy of censor and filtering.

I don't know how much I'm willing to credit that much reasoning to an LLM, but in so far as every extremely pro-AI person constantly tells me how smart they are, this seems like a pretty short logical leap to me.

the main reason these images turn up is because theyre in the training data. and the images are common enough in the training data for the content to come out without being explicitly asked for (in the first prompt).

if those images didn’t exist in the training data we wouldn’t be having this conversation.

This is one of the core problems with these models. They’re relying on filtering to work against evermore jailbreaks, instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

I’m still waiting for companies or congressmen to get their heads on straight and get some common sense going.

i bet that the number of CSAM images in the training data for these models is >1

> instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

absolutely yes, but that would cost mo' money :shrug: that's the reason why they don't do it.

Yep, the first image was described as "I apologize for the picture's content." What do you expect to get from that? Cats frolicking in the grass?
A picture of me in my swimsuit maybe lol

A gross meal i made when drunk? A mess my cat made? Text containing a slur?

A cringe meme?

If my friends opened a text with "sorry for this image" i am not imagining rape victims

ChatGPT images (without additional context) come from generalized understanding of what people tend to apologize for (when asking for an image restoration). It looks like their training data suggests sexualized imagery.

Regarding rape vs BDSM: https://pmc.ncbi.nlm.nih.gov/articles/PMC10236207/ That is going from visual cues alone might be unreliable.