|
|
|
|
|
by SkyPuncher
1010 days ago
|
|
I don’t think this is a particularly useful benchmark. It’s well known that LLMs are bad at math. The token based weighting can’t properly account for numbers that can vary wildly. Numbers are effectively wildcards in the LLM world. |
|
And that (understanding a natural language question) is the USP for LLMs.