Hacker News new | ask | show | jobs
by ChadNauseam 9 hours ago
> Anthropic claims Mythos is in a class of its own, the evidence corroborates this and the government believes it.

They didn't release Mythos, they released Fable, which was Mythos + a classifier that detected potentally-dangerous prompts and blocked them. Everyone who used it noticed how aggressive the classifier was. It would trigger constantly over totally innocent stuff.

1 comments

A classifier that was exposed as non-efficacious for a product touted as having extremely dangerous capabilities.

I can generate hacks trivially by asking any model to fix open source code.

Let’s not pretend you get to have your cake and eat it too.