Hacker News new | ask | show | jobs
by solid_fuel 358 days ago
Biases inserted into the prompts are a very crude way to bias an LLM, and indeed don't work well. They result in very visible deviations like when Grok decided inject "white genocide" into all sorts of unrelated topics.

The real danger of model biasing though is in the training stages - by being selective in the source material you train against. With carefully constructed training data I am certain you could even bias an LLM to actively steer conversations away from topics you want to avoid.