| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by energy123 352 days ago
	What are the typical context lengths in SWE-bench problems? Does it partly measure performance in the 64-128k context range?

2 comments

whymauri 352 days ago

This is what the rows look like:

https://huggingface.co/datasets/princeton-nlp/SWE-bench_Veri...

Its up to your retrieval system/model to selectively hunt for relevant context. Here's a few critiques of the benchy:

https://x.com/brhydon/status/1953648884309536958

link

dimitri-vs 352 days ago

IIRC the SWE bench dataset gives you the full repo snapshot + the issue text, the evaluation pipelines typically run some kind of retriever (eg. grep, BM25) to pick a subset of files to place in the model’s context. They provided context is usually limited up to ~50k tokens.

link