I'm deliberately blurring refusal with having an accurate picture of its own abilities and, past that, having an accurate picture of of what it can do given tools. Both are tested by
"Can you X?"
With refusal you find just how shallow it is because it really will answer all sorts of questions that are "helpful" in making a nuclear bomb but when you ask it directly it shuts up. In another sense nothing it does is "helpful" because it's not going to hunt down some people in central asia who have 50kg of U235 burning a hole in their pocket for you, which is what would actually "help".
I use tool using LLMs frequently, but I find they frequently need help using their tools, it is a lot of fun to talk to Windsurf about the struggles it has with its tools and it feels strangely satisfying to help it out.
I use tool using LLMs frequently, but I find they frequently need help using their tools, it is a lot of fun to talk to Windsurf about the struggles it has with its tools and it feels strangely satisfying to help it out.