| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by retrac 1232 days ago

Actively "censoring" the AI is fundamental to how these language models are created. Such feedback is part of how the model learns.

In a certain light, every response in training that is marked as dispreferred by a human is censoring the AI. It will produce those kinds of results less often. The end-users will not encounter the dispreferred results as frequently. With ChatGPT criteria it was judged on included how relevant the answers were to the question, factually incorrect answers were penalized, and not being blatantly offensive was obviously one of the criteria, too.

What would a model that wasn't censored in training even look like?

(I believe ChatGPT also has a more traditional expert system placed between the user and the language model, which flags keywords and other programmed-in patterns. That is more literally censoring the language model. But the above-mentioned issue would still exist even without such a system.)

2 comments

Our_Benefactors 1232 days ago

> What would a model that wasn't censored in training even look like?

It could cite statistics without long winded disclaimers. Or be able to cite them at all.

link

astrange 1232 days ago

It can't cite anything because it's an LLM which is fundamentally unable to do that.

I've seen NRx people on the internet (* they're like rationalists but even more racist.) They seem willing to believe any abuse of statistics that looks sufficiently cynical.

link

meltyness 1232 days ago

That's not at all how these work, GPTs are not a recommendation-engine, it's a neural model of translation.

link