Hacker News new | ask | show | jobs
by natch 1100 days ago
>compare each of their performance against supposed ground truth labels.

Fixed it for you.

1 comments

I mean, sure. For ground truth, we are using the labels that are part of the original dataset: * https://huggingface.co/datasets/banking77 * https://huggingface.co/datasets/lex_glue/viewer/ledgar/train * https://huggingface.co/datasets/squad_v2 ... (exhaustive set of links at the end of the report).

Is there some noise in these labels? Sure! But the relative performance with respect to these is still a valid evaluation

Agreed, thanks for highlighting these links!