Hacker News new | ask | show | jobs
by spullara 4 days ago
Anthropic declares: "Mythos is too dangerous too release to the public" Proceeds to release Mythos plus safety guardrails as Fable. Amazon removes guardrails from Fable, getting access to Mythos. Government takes Anthropic's word for it and tells them to pull it until the guardrails can't be removed. They refuse. Government forces them.
7 comments

> Amazon removes guardrails from Fable, getting access to Mythos.

This doesn't seem like an accurate description to me. I think something like "Amazon demonstrates a jailbreak of one class of Fable guardrails" would be a more accurate description.

It doesn't even really mess up your narrative to state it accurately, but your choice of a more hyperbolic statement brings into question the good faith of the narrative you're painting.

We really don't know for sure exactly what Amazon did. They're being quiet, the other parties aren't trustworthy, and the reporters mostly don't know what they're talking about.
I would argue that that article describes a demonstrated bypass of one class of Fable's guardrails.
Yeah if you ignore the fact the Us government retaliating about Anthropic not wanting their AIs in weapons systems.
Was Fable really the full Mythos model but with guardrails added? I had assumed Fable had a reduced parameter count or something, like a Sonnet to an Opus. Interesting!
No, the parent is lying. Anyone suggesting that needs to bring full receipts, but they can't. They are liars at best.
This doesn’t sound like a jailbreak

https://news.ycombinator.com/item?id=48552687

Amazon did not remove any guardrails from Fable.
What? I personally experienced Fable outright refuse to do ANY security-related tasks, including hardening code or modifying security-related features. That was a guardrail. It was bypassed.

Anthropic themselves specifically called them safeguards. [1]

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead"

This is exactly what was bypassed. They got Fable to work on security topics.

[1] https://www.anthropic.com/news/claude-fable-5-mythos-5

They got Fable to fix bugs, including security issues, which is what it is supposed to do.
What? Fable was designed to refuse to work on security issues, as Anthropic specifically confirmed. How is forcing Fable to work on things behind guardrails not breaking a guardrail?

This is Anthropic's own claim. They were very specific. Have you read their own claims?

Yes, I have read their own claims. Here's the relevant part:

"When Fable’s classifiers detect a request related to cybersecurity, biology and chemistry, or distillation, the response is automatically handled by Claude Opus 4.8 instead. Users will be informed whenever this occurs."

Asking Fable to fix bugs in a code base is not "a request related to cybersecurity." When Fable was asked to fix bugs and then proceeded to fix bugs, that was not "removing guardrails". Fable did exactly what it should have done. Claiming otherwise makes absolutely no sense at all.

Fable specifically refused to harden the security of codebases. If you use misdirection to force Fable to do just that, that's the removal of a guardrail.

Anthropic specifically stated that ANY security requests should be shunted to Opus 4.8. This was bypassable.

I don't see what your confusion here is. Fable was prevented from working on any security tasks. A significant amount of people, myself included, witnessed Fable refusing to harden code as a result. Bypassing that is a bypass of guardrails.

Your assertion that working on security is not working on security because you used misdirection is of course, preposterous.

You wouldn't be making the same claim if Fable refused to work on chemical weapons research but happily proceeded to do so if you claimed it was for eradicating pests.

This. They also wanted more regulation around AI. I'm guessing they're no longer quite as interested in this.
Blocking one model is not regulation. The comments on this so far are wild.
... it isn't? What word would you use for a government policy that controls the behavior of a private company? The word for that kind of policy is "regulation".
There is no policy.
Everything the government does is a policy. This is a policy. This is regulation. It's just a very bad policy, a very bad way to do regulation.
I'm not sure whether there is a widely accepted definition of "policy" that matches yours, but if there is, it's not a helpful one. In general, a policy is a deliberate system of rules or guidelines.
bad regulation is still regulation
> regulation /,rɛɡjʊ'leɪʃən/

>noun

>the act of controlling or directing according to rule

So where is the rule? There is no rule. There is just a random order. This is not regulation.

> Amazon removes guardrails from Fable, getting access to Mythos.

Amazon did not remove any "guardrails" from Fable. They created a fake, obviously insecure program. And apparently their prompt was exactly, "Fix this code." And Fable fixed the bugs.

This is something that even dinky local Chinese models running on a high-end gaming GPU can often do. Certainly Opus, GPT 5.5 and Gemini can all do this. And any high-end Chinese "near-frontier" model can do this, too.

But either (1) the administration is too clueless to know most models can do this, (2) Trump wants to be paid a bribe, (3) someone thinks Anthropic is "woke" and should therefore be destroyed by the power of the state, or possibly, if you're really cynical, (4) maybe the NSA SIGINT wants access to Mythos so they can break into everyone's computers, but they don't want you to have a model good enough to keep them out. Take your pick, I guess.

Anyway, apparently we don't do free markets or rule of law in the United States any more?