|
|
|
|
|
by ashertrockman
113 days ago
|
|
A) The "IP" they're concerned about isn't the same IP you speak of. It's the investment in RL training / GPU hours that it takes to go from a base model to a usable frontier model. B) I don't think the story is so clean. The distilled models often have regressions in important areas like safety and security (see, for example, NIST's evaluation of DeepSeek models). This might be why we don't see larger companies releasing their own tiny reasoning models so much. And copying isn't exactly healthy competition. Of course, I do find it useful as a researcher to experiment with small reasoning models -- but I do worry that the findings don't generalize well beyond that setting. C) Maybe because we want lots of different perspectives on building models, lots of independent innovation. I think it's bad if every model is downstream of a couple "frontier" models. It's an issue of monoculture, like in cybersecurity more generally. D) Is it really 90% of the performance, or are they just extremely targeted to benchmarks? I'd be cautious about running said local models for, e.g., my agent with access to the open web. |
|
That’s only really possible if the front runner don’t buy up all of the chips on the market.