|
|
|
|
|
by throwaway314155
548 days ago
|
|
> an automated tool doing a complex ammount of steps that you'd normally expect an average-ish worker to do for you on a RELIABLE rate basis. I.E. Doing your taxes like your accountant or 10 year old hopefully does. Seems to be the definition they're using. Which is a high bar, in my opinion - but it does illustrate the difficulty current systems will have in meeting an exceptionally high bar of quality (human-grade). Defining it this way and exploring percentage task failure compared to typical (expert?) human doing the same work is valuable insight, in my opinion. On the other hand, you can define agents as anything that does tool calling, but then it's trivial to create an agent but still non-trivial to meet expectations of the typical consumer because you aren't observing their failure rates. |
|