public safety is downstream of distillation. If you can distill claude, then no amount of guardrails on claude will protect you from what someone can do with it.
Distillation is not a thing unless you actually have the model weights. What people misleadingly call distillation is just training on chat logs, which has always been routine practice in the industry. There's a reason why every model today talks like early releases of ChatGPT.
No, a company choosing to use some terminology doesn’t make it correct nor canonical in any sense; especially when they have a vested interest in not being neutral or credible.
If Google starts calling ads “Best Links” that doesn’t make it correct nor canonical; the correct term is still ads.
Traditionally, distillation is when you get the actual logits of a model response (not exposed via API for years) and then use that to train a model.
Probably the same reason a Epyc 9965 from hetzner performs just as well as one from AWS for one tenth the cost.
Anthropic is offering a commodity product and trying to convince you it isn’t.
It’s even in the name, it’s a myth and a fable. Never happened doesn’t exist.
Also I believe at least on coding that qwen is now the frontier model, fable is its copy of frontier models. In the same way that the Ferrari Luce is an expensive imitation of a SU7 Ultra.