|
|
|
|
|
by jsenn
52 days ago
|
|
The article you are responding to showed that a strange LLM behaviour was caused by a training signal that was explicitly designed to produce that type of behaviour. They were able to isolate it, clearly demonstrate what happened, and roll out a mitigation using a mechanism they engineered for exactly this type of thing (the developer prompt). That doesn’t sound like sorcery to me. If anything I’m surprised you can so easily engineer these things! |
|
There is probably a whole testing workflow at AI companies to tweak each new model until it "looks" acceptable.
But they still don't understand what they are doing. This is purely empirical.