Hacker News new | ask | show | jobs
by singron 3 hours ago
Other companies were allegedly distilling the models by training on the reasoning output. By hiding the reasoning tokens, it makes it harder to do this. You can still try to distill the models, but you can't distill reasoning itself as well.

This could all be optics as well to try to give the appearance of a defensible moat. E.g. they can claim to investors that they are able to protect a significant chunk of their intellectual property this way. I'm not sure if anyone has a study about how significant the summarization is to distillation.

1 comments

> Other companies were allegedly distilling the models by training on the reasoning output

In the case of makers of open-source models (which are also competition), there is no allegedly, they were (and still are) openly doing that.

In the case of the closed models too... Claude would happily tell you it was deepseek-v3 if you asked in chinese until it caught public attention and they papered over it.