|
|
|
|
|
by anonfunction
47 days ago
|
|
I've been doing bias and misaligned behavior research, creating custom private eval suites to test and compare models. Claude Opus 4.7 is heavily biased and presents clear regulatory and reputational risk. It seems the initial product footprint tries to sidestep this problem by not giving the agents control on who to lend to or which applications to approve. Even so I think it's quite an optimistic read on their end. Happy to share reports to anyone who's interested (montana@latentevals.com), especially if you work at a frontier model lab and are interested in plugging my evals into your RL systems! |
|
All I did was upgrade claude code and use the new model. It most definitely exhibits misaligned behavior (compared to 4.6)