Hacker News new | ask | show | jobs
by SkyPuncher 1010 days ago
I don’t think this is a particularly useful benchmark.

It’s well known that LLMs are bad at math. The token based weighting can’t properly account for numbers that can vary wildly. Numbers are effectively wildcards in the LLM world.

1 comments

Surely this is a "didn't read the question properly" problem rather than a "didn't maths right" problem?

And that (understanding a natural language question) is the USP for LLMs.