Hacker News new | ask | show | jobs
by slackr 333 days ago
Very interesting. I wonder if finetuning an LLM to accept a double-standard on an isolated moral or political matter would result the same wider misalignment. Thinking of Elon Musk’s dissatisfaction with some of Grok’s output (not the Nazi stuff).