Hacker News new | ask | show | jobs
by amrb 1144 days ago
I'd like to see a yearly benchmark for models, could be logic puzzles or a suit of tasks but as it stands there is not good way to measure the ability of models.