the trick is to phrase the problem in a way that GPT4 will always give the incorrect answer (due to vagueness of your problem) and that multiple rounds of guiding/correcting are needed to solve.
That's pretty good because it can exhaust the context window quickly and then it starts spiraling out of control, which would require the candidate to act.
If you only use ChatGPT to code, you are only able to copy paste the llm emitted code, then you ask for changes to the code (to reflect for example the evolution of the product)
There's more than one possible AI on the other end, so crafting something that will not annoy a typical candidate, but will lead every AI astray seems pretty difficult.
Maybe you could allow using AI, but only through the interviewer-provided interface. That interface would allow using any model the candidate likes, but before sending the response it will inject errors into the code (either randomly or through another AI prompt).