Hacker News new | ask | show | jobs
by mquander 1100 days ago
I just looked again and I didn't see that claim, can you verify? https://arxiv.org/pdf/2306.08997.pdf

If as per the linked critique, some of the questions in the test set were basically nonsense, then clearly they couldn't have manually verified all the answers or they would have noticed that.

1 comments

>We then process the data by manually correcting each question and answer to ensure quality and correctness

Section 2.1

Then the github repo also has wording around this:

> We double-verify manually that the grading of the test set is correct. https://github.com/idrori/MITQ/blob/main/index.html#L552

I agree it looks like this may not have actually been done given some of the questions and answers in the dataset.