| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by dnautics 51 days ago
	public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.

2 comments

zozbot234 51 days ago

Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.

link

ACCount37 50 days ago

You can logit distill (full token probabilities) or one hot distill (chat logs), or even align hidden states. All are distillation methods.

link

senordevnyc 51 days ago

If most people call it that, including the big labs, then maybe…you’re just out of date?

link

ericpauley 51 days ago

If Anthropic is calling it distillation [1] then that would argue for it being correct (or at least canonical) terminology.

[1] https://www.anthropic.com/news/detecting-and-preventing-dist...

link

dannyw 51 days ago

No, a company choosing to use some terminology doesn’t make it correct nor canonical in any sense; especially when they have a vested interest in not being neutral or credible.

If Google starts calling ads “Best Links” that doesn’t make it correct nor canonical; the correct term is still ads.

Traditionally, distillation is when you get the actual logits of a model response (not exposed via API for years) and then use that to train a model.

link

cherryteastain 51 days ago

This logic works only if distilling Claude is the only way to create another SOTA LLM, which is not the case.

link

maxdo 51 days ago

it's not but full path is billions of dollars vs 10-100m range to stay near sota.

the problem is so large scale that distill attempts attribute to a decent share of their token revenue generally.

link

sciencejerk 51 days ago

How do you think the Qwen and MiniMax models perform so similarly to Anthropic frontier models? What is your take then?

link

jackjeff 50 days ago

Well Anthropic did not ask for permission before they distilled copyrighted material.

At least the Chinese have the decency of giving back the model weights and not put BS censorship because “it’s too dangerous”.

link

cebert 50 days ago

Ask DeepSeek about Tianamen Square and see what happens. The Chinese models have censorship too.

link

mcmcmc 51 days ago

They probably stole all the same copyrighted IP

link

_3u10 51 days ago

Probably the same reason a Epyc 9965 from hetzner performs just as well as one from AWS for one tenth the cost.

Anthropic is offering a commodity product and trying to convince you it isn’t.

It’s even in the name, it’s a myth and a fable. Never happened doesn’t exist.

Also I believe at least on coding that qwen is now the frontier model, fable is its copy of frontier models. In the same way that the Ferrari Luce is an expensive imitation of a SU7 Ultra.

link

abletonlive 51 days ago

> Also I believe at least on coding that qwen is now the frontier model

The delusions people live in just to be a hater.

link

yeeeloit 51 days ago

China no. 1?

link