|
|
|
|
|
by rdli
488 days ago
|
|
The blog post was a little unclear, so my summary was: - They used QwQ to generate training data (with some cleanup using GPT-4o-mini) - The training data was then used to FT Qwen2.5-32B-Instruct (non-reasoning model) - Result was that Sky-T1 performs slightly worse than QwQ but much better than Qwen2.5 on reasoning tasks There are a few dismissive comments here but I actually think this is pretty interesting as it shows how you can FT a foundation model to do better at reasoning. |
|