The hiring tests are designed to serve as a predictor for human applicants. How well an LLM does on them doesn’t necessarily say anything about the usefulness of those tests as said predictor.
Well, what it shows is that hiring tests are not useful as Turing tests. But nobody designed them to be or expected them to be! At best it "proves" is that hiring tests are not sufficient. But again, nobody thought they were. And even still, the assumption a human is taking the hiring test still seems reasonable. Why overengineer your process?