|
|
|
|
|
by kostaj
21 days ago
|
|
This paper covers only the disagreement between models and established only the floor of the error, based on the disagreement, but not which model is better. Planning to follow up with another study to benchmark against human-labelled verdicts still using a corpus that the models have not seen during training. |
|