Hacker News new | ask | show | jobs
by johntb86 463 days ago
I'd be curious what would happen if you SFTed a larger model with successful reasoning traces from the smaller model. Would it pick up the overall reasoning pattern, but be able to apply it to more cases?