Hacker News new | ask | show | jobs
by hopinhopout 70 days ago
LLM's really causing serious brainrot if html pelican drawings are a usage basis for your programming projects, even all these shitty benchmarks don't say or mean anything if companies secretly tweak them on the go
1 comments

Most of the 'coding benchmarks' are deeply flawed too. This one at least makes it explicit

And so far, the ability to make SVGs of $animal on $ vehicle seems to correlate surprisingly well with model 'intelligence'