But in the (hypothetical) limit where AI tools outperform all humans, what does this updated test look like? Are we even testing the humans at that point?