Hacker News new | ask | show | jobs
by bs7280 10 hours ago
I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access.

Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.

5 comments

I think it's a big mistake to conflate the cyber (and bio) refusals with the LLM development refusals.

I can sympathize with the argument for the cyber refusals - especially as a temporary measure - especially if Mythos is available to those trying to defend against vulnerabilities.

The LLM development nerfing (and now refusals) is very different though. Anthropic has even said it isn't just for safety reasons:

> Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

It's at least partially an anti-competitive measure.

The closest analogy is putting measures in a compiler to stop it being able to build other compilers.

Another analogy is priesthoods with secret religious knowledge that "only they are qualified to know".

The Anthropic refusal description is even more direct.

“The request could assist the development of competing AI models, which is restricted under Anthropic's commercial terms. Benign machine learning work can also trigger this category.”

Source: https://platform.claude.com/docs/en/build-with-claude/refusa...

Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.

Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.

public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.
Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.
If most people call it that, including the big labs, then maybe…you’re just out of date?
If Anthropic is calling it distillation [1] then that would argue for it being correct (or at least canonical) terminology.

[1] https://www.anthropic.com/news/detecting-and-preventing-dist...

No, a company choosing to use some terminology doesn’t make it correct nor canonical in any sense; especially when they have a vested interest in not being neutral or credible.

If Google starts calling ads “Best Links” that doesn’t make it correct nor canonical; the correct term is still ads.

Traditionally, distillation is when you get the actual logits of a model response (not exposed via API for years) and then use that to train a model.

This logic works only if distilling Claude is the only way to create another SOTA LLM, which is not the case.
it's not but full path is billions of dollars vs 10-100m range to stay near sota.

the problem is so large scale that distill attempts attribute to a decent share of their token revenue generally.

How do you think the Qwen and MiniMax models perform so similarly to Anthropic frontier models? What is your take then?
They probably stole all the same copyrighted IP
Probably the same reason a Epyc 9965 from hetzner performs just as well as one from AWS for one tenth the cost.

Anthropic is offering a commodity product and trying to convince you it isn’t.

It’s even in the name, it’s a myth and a fable. Never happened doesn’t exist.

Also I believe at least on coding that qwen is now the frontier model, fable is its copy of frontier models. In the same way that the Ferrari Luce is an expensive imitation of a SU7 Ultra.

> Also I believe at least on coding that qwen is now the frontier model

The delusions people live in just to be a hater.

China no. 1?
I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.
No need to wonder.

The answer is, the organization making the powerful tool. The people in charge of Anthropic.

Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/

You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.

Amodei has no values, he's a hollow husk and he'd sell his family into sex slavery if it could make him a buck.
Nonsense. Everyone has values. "Make myself maximum money" is a value. "Amass maximum power over the world's information" is a value. It's clear Amodei certainly follows the latter, and I would soften the former somewhat for him; they did after all decline the Pentagon contract that would have made money but would have meant giving up some control of information.
That would be Anthropic.
Well, Anthropic thinks it should be the Trump administration [1].

This whole business just keeps getting dumber.

1: https://darioamodei.com/post/policy-on-the-ai-exponential

Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.
No. You read the actual essay, then explain how we're supposed to interpret this more charitably:

    Frontier AI models, like airplanes, should 
    be required to go through technical testing 
    and auditing, and their release should be 
    blocked or reversed as a threat to public 
    safety if they do not meet high standards 
    of safety. I am grateful to see the Trump 
    administration’s Executive Order move 
    incrementally towards a greater role for 
    government in AI, though Anthropic’s proposal 
    recommends even further action. 
They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.
I agree with your sentiment but not your conclusion. They don't want this administration specifically to have gatekeeping authority, what they want is any administration to say that they are gatekeeping, so that they can regulate the competition out of existence. Of course the actual checks and balances will be near pointless in effect, but expensive to implement nonetheless.
This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."
You got baited by a confirmed Anthropic shill, see more info here: https://news.ycombinator.com/item?id=48270186
I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails
I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.

Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."

Ok fine, I said go for it, and it says:

"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."

Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.

exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.