|
|
|
|
|
by ekez
814 days ago
|
|
The metric the authors use confuses me. Edit distance seems like a strange way to test if the model understands arithmetic ([1], Figure 3). I think `1+3=3` would be equally as correct as `1+1=9`? Why not consider how far off the model is `abs(actual-expected)`? I wonder if there is an inflection point with that metric. https://arxiv.org/abs/2206.07682 |
|
We don't really know how LLMs do arithmetic. Maybe token edit distance would be interesting, but either way it doesn't really change the claim of the paper.
Unrelated: The link is incorrect, the one you're referring to is here: https://arxiv.org/pdf/2304.15004.pdf