Hacker News new | ask | show | jobs
by morelisp 1100 days ago
How was ground truth obtained if not via human annotation?
2 comments

Some data is self annotating. You can count occurrences of features in context and then you know with increasing confidence that a feature occurs in a context. You can also build up meaning with more observation of events in context. Sounds circular, but notice no human is required in this process. Sure, in other cases or for other steps humans can be useful, but they aren't always needed for ground truth.
Hi, one of the authors here. Good question! For this benchmarking, we evaluated performance on popular open source text datasets across a few different NLP tasks (details in the report).

For each of these datasets, we specify task guidelines/prompts for the LLM and human annotators, and compare each of their performance against ground truth labels.

You didn't answer the question at all, although to be fair the answer is both obvious and completely undermines your claim so I can see why you wouldn't.
>compare each of their performance against supposed ground truth labels.

Fixed it for you.

I mean, sure. For ground truth, we are using the labels that are part of the original dataset: * https://huggingface.co/datasets/banking77 * https://huggingface.co/datasets/lex_glue/viewer/ledgar/train * https://huggingface.co/datasets/squad_v2 ... (exhaustive set of links at the end of the report).

Is there some noise in these labels? Sure! But the relative performance with respect to these is still a valid evaluation

Agreed, thanks for highlighting these links!