Hacker News new | ask | show | jobs
by andoando 106 days ago
The thinking models are additionally trained with reinforcement learning to produce chain of thought reasoning