Y
Hacker News
new
|
ask
|
show
|
jobs
by
slewis
377 days ago
OpenAI created a benchmark for this:
https://openai.com/index/paperbench/
1 comments
suddenlybananas
377 days ago
Still has data contamination though.
link
Szpadel
376 days ago
still LLM cannot beat it so it's good enough for start
link