Hacker News new | ask | show | jobs
by amelius 107 days ago
There should be a way to turn the questions we ask LLMs into benchmarks.

That way, we can have a benchmark that is always up to date.

1 comments

There are a few “updating” benchmarks out there. I periodically take a look at these two:

https://swe-rebench.com/

https://livebench.ai/