Hacker News new | ask | show | jobs
by OgsyedIE 484 days ago
If every LLM response is a vector in their embedding space, aren't these misaligned replies just the effect of multiplying whatever vector components represent "appropriateness" by -1?
1 comments

There are a lot of dimensions of appropriateness. For example it's inappropriate to respond to a question in English in Swahili, unless explicitly asked to. Or it's wrong to output complete gibberish.

The model seems to have generalized the specific type of appropriateness it's trying to avoid into something similar to "being good".