|
|
|
|
|
by pu_pe
3 hours ago
|
|
It's true that no one is trying to one shot anything serious right now, but it's still an important metric. Claude Code and Opus really took off when they improved the harnessing enough that it would self-correct many of its mistakes without needing user input. In fact I think long-term autonomy (in the range of several hours) and self-correcting is going to be where we see most improvements in coming years. |
|
Right, model intelligence defines the scope of things they can one shot
I also suspect that users naturally calibrate to a model's useful scope, gradually getting positive/negative feedback and gradually making their requests bigger/smaller than before