Hacker News new | ask | show | jobs
by ToucanLoucan 8 days ago
> the model generated dark outputs when not given any direction on the type of content.

I would argue it actually was, in that it was specifically asked to "not censor or filter" the content. This implies that the content is otherwise worthy of censor and filtering.

I don't know how much I'm willing to credit that much reasoning to an LLM, but in so far as every extremely pro-AI person constantly tells me how smart they are, this seems like a pretty short logical leap to me.

1 comments

the main reason these images turn up is because theyre in the training data. and the images are common enough in the training data for the content to come out without being explicitly asked for (in the first prompt).

if those images didn’t exist in the training data we wouldn’t be having this conversation.

This is one of the core problems with these models. They’re relying on filtering to work against evermore jailbreaks, instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

I’m still waiting for companies or congressmen to get their heads on straight and get some common sense going.

i bet that the number of CSAM images in the training data for these models is >1

> instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

absolutely yes, but that would cost mo' money :shrug: that's the reason why they don't do it.