| HN Mirror

There are a lot of dimensions of appropriateness. For example it's inappropriate to respond to a question in English in Swahili, unless explicitly asked to. Or it's wrong to output complete gibberish.

The model seems to have generalized the specific type of appropriateness it's trying to avoid into something similar to "being good".