But the benchmark didn't ask those questions, and it seems grok is very well at saying it doesn't know the answer otherwise.