| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by wsdookadr 806 days ago
	What makes you say it's not representative?

2 comments

skywhopper 806 days ago

SWE-bench Lite is a subset of extremely simple issues from a cherry-picked subset (SWE-bench) of a handful of large (presumably well-run) Python-only projects.

Here are some rules they used to trim down the SWE-bench Lite problems:

* We remove instances with images, external hyperlinks, references to specific commit shas and references to other pull requests or issues.

* We remove instances that have fewer than 40 words in the problem statement.

* We remove instances that edit more than 1 file.

* We remove instances where the gold patch has more than 3 edit hunks (see patch).

See https://www.swebench.com/lite.html

link

kevindamm 805 days ago

That's... rather limiting.

link

arp242 806 days ago

Look at the data. Does that seem like the average bug report to you?

link

falcor84 806 days ago

It would help if you were to provide a specific example or two

link

arp242 806 days ago

You can't demonstrate whether a dataset is representative or not by "an example or two". You need to look at all the data.

And all of this is fine. It's just a benchmark suit and doesn't need to be fully representative. The dataset itself doesn't even claim to be that as far as I can find. All I'm saying that the title wasn't really accurate.

link