Well, I presume you have to figure out how to evaluate their output, especially for trustworthiness. And that's something you have to do the core of yourself, no matter how many AI researchers you'll have.
The premise of the plan is that evaluating output is easier than producing it, such that a human researcher could look at the AI researcher's output and tell if it's correct and trustworthy. If this is true, what else is there to figure out?