To me it's the lack of skill. If the LLM spits out junk you should be able to tell. ChatGPT-based interviews could work just as well to determine the ability to understand, review and fix code effectively.
>> If the LLM spits out junk you should be able to tell.
Reading existing code and ensuring correctness is way harder than writing it yourself. How would someone who can't do it in the first place tell if it was incorrect?
Make the model write annotated tests too, verify that the annotations plausibly could match the test code, run the tests, feed the failures back in, and iterate until all tests are green?
Reading existing code and ensuring correctness is way harder than writing it yourself. How would someone who can't do it in the first place tell if it was incorrect?