my impression is that if open models did 'distill' claude they made some interesting and productive ideas, like deepseek's more efficient attention
my impression is that if open models did 'distill' claude they made some interesting and productive ideas, like deepseek's more efficient attention