the trick is to phrase the problem in a way that GPT4 will always give the incorrect answer (due to vagueness of your problem) and that multiple rounds of guiding/correcting are needed to solve.
That's pretty good because it can exhaust the context window quickly and then it starts spiraling out of control, which would require the candidate to act.
If you only use ChatGPT to code, you are only able to copy paste the llm emitted code, then you ask for changes to the code (to reflect for example the evolution of the product)