|
|
|
|
|
by johnfn
177 days ago
|
|
Really, you haven't found a single task they can't do? I like agents, but this seems a little unrealistic? Recently, I asked Codex and Claude both to "give me a single command to capture a performance profile while running a playwright test". Codex worked on this one for at least 2 hours and never succeeded, even though it really isn't that hard. |
|
That made the test pass of course, leaving the code as broken as it ever was. Guess that one was on me though, I never specified it shouldn't do that...