| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by morelisp 1100 days ago
	How was ground truth obtained if not via human annotation?

2 comments

natch 1100 days ago

Some data is self annotating. You can count occurrences of features in context and then you know with increasing confidence that a feature occurs in a context. You can also build up meaning with more observation of events in context. Sounds circular, but notice no human is required in this process. Sure, in other cases or for other steps humans can be useful, but they aren't always needed for ground truth.

link

nihit-desai 1100 days ago

Hi, one of the authors here. Good question! For this benchmarking, we evaluated performance on popular open source text datasets across a few different NLP tasks (details in the report).

For each of these datasets, we specify task guidelines/prompts for the LLM and human annotators, and compare each of their performance against ground truth labels.

link

morelisp 1100 days ago

You didn't answer the question at all, although to be fair the answer is both obvious and completely undermines your claim so I can see why you wouldn't.

link

natch 1100 days ago

>compare each of their performance against supposed ground truth labels.

Fixed it for you.

link

nihit-desai 1100 days ago

I mean, sure. For ground truth, we are using the labels that are part of the original dataset: * https://huggingface.co/datasets/banking77 * https://huggingface.co/datasets/lex_glue/viewer/ledgar/train * https://huggingface.co/datasets/squad_v2 ... (exhaustive set of links at the end of the report).

Is there some noise in these labels? Sure! But the relative performance with respect to these is still a valid evaluation

link

natch 1100 days ago

Agreed, thanks for highlighting these links!

link