Y
Hacker News
new
|
ask
|
show
|
jobs
by
djfergus
2 days ago
We need a benchmark that tests a models ability to do LLM research.