Hacker News new | ask | show | jobs
by slewis 377 days ago
OpenAI created a benchmark for this: https://openai.com/index/paperbench/
1 comments

Still has data contamination though.
still LLM cannot beat it so it's good enough for start