Y
Hacker News
new
|
ask
|
show
|
jobs
by
johntb86
463 days ago
I'd be curious what would happen if you SFTed a larger model with successful reasoning traces from the smaller model. Would it pick up the overall reasoning pattern, but be able to apply it to more cases?