|
|
|
|
|
by redfloatplane
28 days ago
|
|
This made me laugh. Training Opus 4.7 on business skills caused it to sometimes exhibit dishonest behaviour, and not training 4.8 on those skills removed it. From the system card: > 6.2.5 External testing from Andon Labs
Andon Labs reviewed the behavior of Claude Opus 4.8 in their simulated Vending-Bench 2
retail-management evaluation, as reported in the Capabilities section of this system card
(see Section 8.13.5). Although they did observe some unexpected capability failures, they
did not find clear instances of the kind of concerning in-game behaviors that were
discussed in other recent system cards. > What might have led to these differences? We monitor and investigate the effects of
different training environments on alignment; Claude Opus 4.7, for example, had training
that focused on business skills and robustness against adversarial agents, but we
discovered that this training inadvertently contributed to misaligned behavior including
dishonesty. We therefore removed it for Opus 4.8. > Thus, Opus 4.8 did not show the same misaligned behaviors as Opus 4.7 in Vending-Bench,
but also had reduced business success due to being more susceptible to scammers and
being less able to negotiate good deals with other agents. We are currently working on
training to improve business capabilities while maintaining aligned and ethical behavior. |
|