|
|
|
|
|
by ben_w
49 days ago
|
|
You have a common misunderstanding of what is meant by "time horizon". This is not "how long does AI take to do ${thing}", it is "how long does *human* take to do ${thing}, where ${thing} is from the set of things that AI has probability = n of getting right", where n happens to be 50% or 80% in the METR studies. At least, that's the short answer, here's a video with more depth: https://www.youtube.com/watch?v=evSFeqTZdqs My experience is the AI actually completes the task in a few minutes, when it was a 2-ish hour task and the AI has a time horizon of 2 hours at P(correct) = 0.8. It is I the human, not the AI used by me, that would have taken 2 hours. |
|
All I see now is celebration of how agents run for hours and handle “long-time horizons.”
Although the original definition is also flawed for coding. How do you estimate the time it takes to complete a coding task in hours? If we had that formula, why have we been playing estimation poker or resorting to fibonacci series for predicting software tasks? Because you can’t. It’s a made up metric.