A Human Interviewer can be held responsible for their actions, a machine, so far, cannot. Outside of the potential for cutting costs, abdication of responsibility is the number one reason we're looking to adopt these systems.
Humans have a much greater diversity in bias because we have all lived our own unique lives. LLMs are incredibly limited, by contrast. Even if you were somehow to simulate bias by exposing subsets of LLMs to subsets of human knowledge corpuses, you would need billions of subsets to simulate the diversity of human bias.
Wisdom of the crowd also implies that diversity of human bias is a good thing, in aggregate.
To more closely address your point: if all companies use the same LLM they’ll all have the same hiring bias. But if Company Foo has Hiring Manager Bob that’s biased against me, I can shoot my shot with Company Bar with Hiring Manager Alice who might not be.
LLMs have no awareness of their own bias, and no incentive or ability to mitigate it. A human can, in theory, realize "hey, I tend to be a little harsh on <demographic>, is this negative judgement just that?" while an LLM could never.
In practice I doubt many people are aware of their biases either, or think "it's not bias if it's true" or something. But at least on the less "internally" biased end of humans there will be less external manifestation of it.
They don't have any concept of their "personal" bias, so they'd imitate whatever training data they received that was tagged as not being biased, if there even was any.
So you might think, but no. The LLM contains a large number of biases, coming from different training texts. Depending on how you structure the question, you can get biased statements.
For instance, if I discuss audio electronics with Google Gemini, depending on what kinds of questions I ask, I can get audiophile crackpot quackery out of it, or I can get solid electronic engineering statements.
The training data contains a vast number of narratives that are filled with different points of view. Generally speaking, you get the ones that resonate with your own narrative threaded through your prompts.
One way is if you ask loaded questions: questions which assume that some statements hold true, and are seeking clarification within that context. If the AI hasn't been system-prompted or fine tuned to push back on that topic, it may just take those assumptions at face value, and then produce token predictions out of narratives which express similar assumptions.