| HN Mirror

I wonder the same thing. If any academic reading this wants a paper idea:

1. Examine papers and other claims that an LLM gets something wrong that a human would have gotten wrong. How many of those claims have any citations about how many humans actually get it wrong? How many of those citations use the general population instead of the population of people who would be uniquely well-suited to answering the question correctly (i.e. people who signed up for the GRE are more likely to get GRE questions right than the general population).

2. For claims that are totally missing citations on human performance, run some tests with humans from the general population (or as close as you can get), and see how the LLMs compare.