|
|
|
|
|
by Closi
1171 days ago
|
|
I think we are probably just evaluating the paper on different metrics too :) I think my view is just that if your paper is called "Evaluating the Logical Reasoning Ability of GPT-4" and your conclusion is "logical reasoning remains challenging for GPT4" then you should have something in your paper to back up that statement that's more objective, particularly if the findings appear to be that it performs better at logical reasoning than anything else the paper identifies to date. It's supposed to be an academic paper, not a tumblr post. |
|
Running benchmarks seems like a reasonable way to do it. The objective statements are the benchmark results. They are there. That's the main result of the paper.