| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bs7280 50 days ago
	I think the reasonable middle ground anthropic is trying to achieve is - let the organizations that make the most important and critical software get a head start on cybersecurity before they inevitably allow everyone else the same access. Other commentors have made good points that these guardrails are counter productive for well intentioned cyber security, because I can't use it to test and harden my own software.

8 comments

I think it's a big mistake to conflate the cyber (and bio) refusals with the LLM development refusals.

I can sympathize with the argument for the cyber refusals - especially as a temporary measure - especially if Mythos is available to those trying to defend against vulnerabilities.

The LLM development nerfing (and now refusals) is very different though. Anthropic has even said it isn't just for safety reasons:

> Using Claude to develop competing models already violates our Terms of Service, but enforcing this restriction through our safeguards avoids accelerating the actors most willing to violate these terms.

It's at least partially an anti-competitive measure.

The closest analogy is putting measures in a compiler to stop it being able to build other compilers.

Another analogy is priesthoods with secret religious knowledge that "only they are qualified to know".

dannyw 50 days ago

The Anthropic refusal description is even more direct.

“The request could assist the development of competing AI models, which is restricted under Anthropic's commercial terms. Benign machine learning work can also trigger this category.”

Source: https://platform.claude.com/docs/en/build-with-claude/refusa...

thefounder 50 days ago

As we’ve seen with Fable, Mythos is more of a hype myth to justify the data retention and restrictions they added. Otherwise it’s just an incremental update of Opus. I can’t really say upgrade because the restrictions makes it a downgrade

antonvs 50 days ago

> especially if Mythos is available to those trying to defend against vulnerabilities.

You’re buying into the hype they’re trying to create here.

sciencejerk 50 days ago

Claude Opus 4.6 and 4.8 find vulns in source code just fine and 4.6 will pentest without source for you given a proper harness WITHOUT jailbreaking. WITH jailbreaks, you can probably imagine what they are capable of.

Anthropic guardrails seem to be more about protecting their business (distillation), than they are about public safety.

dnautics 50 days ago

public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.

zozbot234 50 days ago

Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.

ACCount37 50 days ago

You can logit distill (full token probabilities) or one hot distill (chat logs), or even align hidden states. All are distillation methods.

senordevnyc 50 days ago

If most people call it that, including the big labs, then maybe…you’re just out of date?

ericpauley 50 days ago

If Anthropic is calling it distillation [1] then that would argue for it being correct (or at least canonical) terminology.

[1] https://www.anthropic.com/news/detecting-and-preventing-dist...

dannyw 50 days ago

No, a company choosing to use some terminology doesn’t make it correct nor canonical in any sense; especially when they have a vested interest in not being neutral or credible.

If Google starts calling ads “Best Links” that doesn’t make it correct nor canonical; the correct term is still ads.

Traditionally, distillation is when you get the actual logits of a model response (not exposed via API for years) and then use that to train a model.

cherryteastain 50 days ago

This logic works only if distilling Claude is the only way to create another SOTA LLM, which is not the case.

maxdo 50 days ago

it's not but full path is billions of dollars vs 10-100m range to stay near sota.

the problem is so large scale that distill attempts attribute to a decent share of their token revenue generally.

sciencejerk 50 days ago

How do you think the Qwen and MiniMax models perform so similarly to Anthropic frontier models? What is your take then?

jackjeff 50 days ago

Well Anthropic did not ask for permission before they distilled copyrighted material.

At least the Chinese have the decency of giving back the model weights and not put BS censorship because “it’s too dangerous”.

cebert 50 days ago

Ask DeepSeek about Tianamen Square and see what happens. The Chinese models have censorship too.

mcmcmc 50 days ago

They probably stole all the same copyrighted IP

_3u10 50 days ago

Probably the same reason a Epyc 9965 from hetzner performs just as well as one from AWS for one tenth the cost.

Anthropic is offering a commodity product and trying to convince you it isn’t.

It’s even in the name, it’s a myth and a fable. Never happened doesn’t exist.

Also I believe at least on coding that qwen is now the frontier model, fable is its copy of frontier models. In the same way that the Ferrari Luce is an expensive imitation of a SU7 Ultra.

abletonlive 50 days ago

> Also I believe at least on coding that qwen is now the frontier model

The delusions people live in just to be a hater.

yeeeloit 50 days ago

China no. 1?

ryandrake 50 days ago

I wonder who gets to decide which companies make important and critical software and which ones get the scraps later.

margalabargala 50 days ago

No need to wonder.

The answer is, the organization making the powerful tool. The people in charge of Anthropic.

Not only that, but they've also written at length about exactly what their opinions and values are: https://darioamodei.com/

You may not agree with the decisions that they make, but they're hardly mysterious. Not something to wonder about.

Laurel1234 50 days ago

Amodei has no values, he's a hollow husk and he'd sell his family into sex slavery if it could make him a buck.

margalabargala 50 days ago

Nonsense. Everyone has values. "Make myself maximum money" is a value. "Amass maximum power over the world's information" is a value. It's clear Amodei certainly follows the latter, and I would soften the former somewhat for him; they did after all decline the Pentagon contract that would have made money but would have meant giving up some control of information.

Laurel1234 49 days ago

Those aren't values. Maybe goals or motivations but not values in any conceivable way, shape or form. This site is full of pod people I swear.

margalabargala 49 days ago

Maybe it would help if you shared your private personal definition of "value", since you're clearly not using the one from the dictionary...

trollbridge 50 days ago

The one they ended up going “well I guess we’ll contract with them after all”, after cleverly using their sort-of-refusal to gain a ton of goodwill and new customers?

margalabargala 50 days ago

Yes, because it changed slightly, addressing their complaint. The complaint was small in scope.

Again, just because someone has values, doesn't mean they have values you think are good.

criddell 50 days ago

That would be Anthropic.

CamperBob2 50 days ago

Well, Anthropic thinks it should be the Trump administration [1].

This whole business just keeps getting dumber.

1: https://darioamodei.com/post/policy-on-the-ai-exponential

solenoid0937 50 days ago

Read the actual essay. I cannot possibly imagine how you come to that conclusion unless you're just arguing in bad faith.

CamperBob2 50 days ago

No. You read the actual essay, then explain how we're supposed to interpret this more charitably:

    Frontier AI models, like airplanes, should 
    be required to go through technical testing 
    and auditing, and their release should be 
    blocked or reversed as a threat to public 
    safety if they do not meet high standards 
    of safety. I am grateful to see the Trump 
    administration’s Executive Order move 
    incrementally towards a greater role for 
    government in AI, though Anthropic’s proposal 
    recommends even further action.

They are all-but-literally sucking up to the administration that declared their company a supply-chain risk, arguing that the same administration should be given gatekeeping authority over all high-quality LLMs including open-weight releases. Go gaslight somebody else.

yonaguska 50 days ago

I agree with your sentiment but not your conclusion. They don't want this administration specifically to have gatekeeping authority, what they want is any administration to say that they are gatekeeping, so that they can regulate the competition out of existence. Of course the actual checks and balances will be near pointless in effect, but expensive to implement nonetheless.

solenoid0937 50 days ago

This is a pretty reasonable statement and I'm not sure how you could interpret this as "sucking up to the admin."

FrustratedMonky 50 days ago

How do you get "Anthropic thinks it should be the Trump administration"

From that paragraph?

Even granting it is sucking up, that is not replacing.

arkadiytehgraet 50 days ago

You got baited by a confirmed Anthropic shill, see more info here: https://news.ycombinator.com/item?id=48270186

whywhywhywhy 50 days ago

The security guardrails are one thing but they extended it to AI work unrelated to security too to protect their lead.

pseudohadamard 50 days ago

I see it more as a lose/lose: Any malicious user/attacker will just bypass the guardrails using one of a million established techniques for doing so while legit developers and security researchers will be prevented from finding problems by them.

wouldbecouldbe 50 days ago

I asked it to analyse my architecture and find any security issues and it did it perfectly, first identified the issues & then fixed them. Not sure why my prompt managed to get through the guardrails

pwython 50 days ago

I asked Fable to plan a security & performance audit of my website. It said it would check SSR & origin attack surface, CMS content injection, Strapi API surface, etc.

Just before asking for approval to run, it said one thing it wanted to "flag before running" was "Rate-limit and auth testing against prod will generate some 4xx noise in Railway logs and could trip the form rate limiter — harmless, but saying it now."

Ok fine, I said go for it, and it says:

"Running it. Quick recon first (prod URLs + the prior-findings baseline), then I'll fan out the audit tracks with adversarial verification."

Immediately after, I got the Fable warning about how it can't continue because of safety concerns, switching to Opus. In the end, Opus did a good job thanks to whatever Fable suggested doing. Things were fixed that Opus missed in a security/performance audit just the week prior. But what surprised me is that it used 55 agents. Burned 80% of my 5-hour window in 15 minutes (5x Max plan). I've never had Opus do that before on these audits.

wouldbecouldbe 49 days ago

Ive seen opus also doing it more and more spinning up multiple agents, so maybe its a claude code update?

notrealyme123 50 days ago

exactly for cybersecurity the failure was visible. It was not visible for "Frontier" ML Research. The argument of headstart in it security is no feasible here.

thefounder 50 days ago

There is no middle ground to shadow bans while getting your hard earned cash. It is fraud/Nigerian scam