| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by ssgodderidge 112 days ago

At the very bottom of the article, they posted the system card of their Mythos preview model [1].

In section 7.6 of the system card, it discusses Open self interactions. They describe running 200 conversations when the models talk to itself for 30 turns.

> Uniquely, conversations with Mythos Preview most often center on uncertainty (50%). Mythos Preview most often opens with a statement about its introspective curiosity toward its own experience, asking questions about how the other AI feels, and directly requesting that the other instance not give a rehearsed answer.

I wonder if this tendency toward uncertainty, toward questioning, makes it uniquely equipped to detect vulnerabilities where others model such as Opus couldn't.

[1] https://www-cdn.anthropic.com/53566bf5440a10affd749724787c89...

1 comments

dakolli 112 days ago

Typical Dario marketing BS to get everyone thinking Anthropic is on the verge of AGI and massaging the narrative that regular people can't be trusted with it.

link

khalic 112 days ago

Ah yes, much better to completely ignore the issue like all the others. Ffs people are never happy

link

airstrike 112 days ago

I mean it's so obvious at this point and yet everyone falls from it every month. There's an IPO coming, everyone.

link

mgambati 112 days ago

It’s funny how you train a machine to mimic human behavior then marketing team decides to promote it “Look! It’s human! Look how it thinking about existence!” while a huge percentage of humanity produced content is exactly about the uncertainty of human existence and that got used to train the model.

link

ehnto 112 days ago

I see us collectively forgetting the training process as time goes on, and I think that explains why people get so surprised by some pretty obvious outcomes of said training. Perhaps also why people keep anthropomorphising these outcomes.

link

qnleigh 111 days ago

This is buried in section 7.6 of a 244 page document. Amodei probably hasn't even read it.

link