Hacker News new | ask | show | jobs
by wongarsu 71 days ago
Most of the 'coding benchmarks' are deeply flawed too. This one at least makes it explicit

And so far, the ability to make SVGs of $animal on $ vehicle seems to correlate surprisingly well with model 'intelligence'