Y
Hacker News
new
|
ask
|
show
|
jobs
by
charlieyu1
310 days ago
Even the benchmarks for maths only checked numerical answers for ground truth, which means the LLM can output a lot of nonsense and guess the correct answer to pass it