Y
Hacker News
new
|
ask
|
show
|
jobs
by
manmal
123 days ago
I have no proof, but these deep thinking modes feel to me like an orchestrator agent + sub agents, the former being RL‘d to just keep going instead of being conditioned to stop ASAP.