I don't think that's the point of the article, happy to read the reasoning used to get to that conclusion.
To me the point is not concerned with usefulness, is with reliability. You could get correct answers out of the agent, but how often do you get correct data versus gibberish? It's an extremely important metric to consider, and it's the same reason you wouldn't hop into a self-driving car in the real world if it can drive flawlessly in a straight line, but once every three intersections turns the wrong way.
I don't see anybody arguing that it isn't useful in general, just that it's unreliable, and that we need to change or add to the fundamental architecture to make progress.
To me the point is not concerned with usefulness, is with reliability. You could get correct answers out of the agent, but how often do you get correct data versus gibberish? It's an extremely important metric to consider, and it's the same reason you wouldn't hop into a self-driving car in the real world if it can drive flawlessly in a straight line, but once every three intersections turns the wrong way.