Hacker News new | ask | show | jobs
by kisper 8 days ago
This is one of the core problems with these models. They’re relying on filtering to work against evermore jailbreaks, instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

I’m still waiting for companies or congressmen to get their heads on straight and get some common sense going.

1 comments

i bet that the number of CSAM images in the training data for these models is >1

> instead of analyzing the training sets and filtering out the prohibited material for the models end-use before training them anew. You can’t make satisfying facsimiles of thing that you don’t know about.

absolutely yes, but that would cost mo' money :shrug: that's the reason why they don't do it.