| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Closi 1171 days ago

I think we are probably just evaluating the paper on different metrics too :)

I think my view is just that if your paper is called "Evaluating the Logical Reasoning Ability of GPT-4" and your conclusion is "logical reasoning remains challenging for GPT4" then you should have something in your paper to back up that statement that's more objective, particularly if the findings appear to be that it performs better at logical reasoning than anything else the paper identifies to date.

It's supposed to be an academic paper, not a tumblr post.

1 comments

skybrian 1171 days ago

How do you make an objective statement about how well GPT-4 does logical reasoning?

Running benchmarks seems like a reasonable way to do it. The objective statements are the benchmark results. They are there. That's the main result of the paper.

link

Closi 1169 days ago

You can make objective statements by benchmarking, but by the nature of benchmarking you need something to benchmark lower to be able to conclude that something is performing poorly.

Benchmarking is comparative - that’s the whole point - so the conclusions aren’t actually backed up by the paper.

link