Hacker News new | ask | show | jobs
by subroutine 79 days ago
At 20 min per task you might as well code it yourself. Bill James needs to write a book on saber-metrics for LLM benchmarks.