How do you say if an LLM is biased? I don't think there is any way to explain (in a way comprehend-able by humans) how the various weights shake out.
So you test it like a black box, but IMO that suffers from the same pollution any of the other tests (coding ability, math ability, w/e) that currently suffer from, except it's even harder to evaluate objectively.
So you test it like a black box, but IMO that suffers from the same pollution any of the other tests (coding ability, math ability, w/e) that currently suffer from, except it's even harder to evaluate objectively.