|
|
|
|
|
by wahnfrieden
85 days ago
|
|
That’s what makes it a fair evaluation and something that requires improvement. We shouldn’t only evaluate agent skills by what is most commonly represented in training data. We expect performance from them on areas that existing training data may be deficient at providing. You don’t need to invent an absurdity to find these cases. |
|
The issue is that people claim the performance is representative of a human's performance in the same situation. That gives an incorrect overall estimation of ability.