| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by aeternum 1100 days ago
	In the paper, they at least claimed to manually verify the correct answers.

2 comments

mquander 1100 days ago

I just looked again and I didn't see that claim, can you verify? https://arxiv.org/pdf/2306.08997.pdf

If as per the linked critique, some of the questions in the test set were basically nonsense, then clearly they couldn't have manually verified all the answers or they would have noticed that.

link

aeternum 1100 days ago

>We then process the data by manually correcting each question and answer to ensure quality and correctness

Section 2.1

Then the github repo also has wording around this:

> We double-verify manually that the grading of the test set is correct. https://github.com/idrori/MITQ/blob/main/index.html#L552

I agree it looks like this may not have actually been done given some of the questions and answers in the dataset.

link

sanderjd 1100 days ago

Then - having not read the paper - what is the point of the automated grading?

link

riffraff 1100 days ago

To not spend time manually grading obviously incorrect ones (i.e. only grading 1/18 of them).

link

sanderjd 1100 days ago

Got it!

link