|
|
|
|
|
by ssgodderidge
65 days ago
|
|
At the very bottom of the article, they posted the system card of their Mythos preview model [1]. In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns. > Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer. I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't. [1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89... |
|