Y
Hacker News
new
|
ask
|
show
|
jobs
by
amrb
1144 days ago
I'd like to see a yearly benchmark for models, could be logic puzzles or a suit of tasks but as it stands there is not good way to measure the ability of models.