Hacker News new | ask | show | jobs
by ahmedhawas123 325 days ago
This is super helpful and I had not seen it, thanks so much for sharing! And I hear you on training being an alpha, at the size of the model I wonder how much of this is distillation and using o3/o4 data.