Hacker News new | ask | show | jobs
by a2128 409 days ago
To be clear, this is not a model trained on zero data, this is a pretrained model (Qwen 2.5 trained on 18 trillion tokens) finetuned using self-generated data grounded by a Python interpreter
2 comments

I think at this point the initial process of exposing the empty model to all the available domain data in bulk is no longer interesting to many people. It's an obvious first step so it's barely mentioned anymore. What's currently worked on is what you do afterwards to get a useful tool in the end.
The breakthrough here is eliminating the need for human-labeled reasoning data while still achieving SOTA results, which has been a major bottleneck in developing reasoning capabilities.