| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by slewis 377 days ago
	OpenAI created a benchmark for this: https://openai.com/index/paperbench/

1 comments

Still has data contamination though.

still LLM cannot beat it so it's good enough for start