Hacker News new | ask | show | jobs
by djfergus 2 days ago
We need a benchmark that tests a models ability to do LLM research.