|
|
|
|
|
by infogulch
1595 days ago
|
|
Would it do better if you asked it to "show its work"? I.e. work it out in long form, one step at a time, like you'd ask a school kid to do. Maybe an example prompt would look like this: Work out 2241 + 19873.
02241 + 19873 ~ ____4
02241 + 19873 ~ ___14 carry 1
02241 + 19873 ~ __114 carry 1
02241 + 19873 ~ _2114 carry 1
02241 + 19873 = 22114.
I'm not sure what is the best way to represent each step including details like carry digits. And you'd have to design a separate scheme for each operation.If these models are symbol manipulators maybe the key is to break down the task into steps that are closer to being solvable with symbol manipulation. |
|
There are ways to fix these issues, but BPEs micro-optimize for the primary text benchmarks that papers want good scores on so those are standard for now. I'm sure they'll get replaced eventually, once the costs outrun the wins and more scalable (alternatives to?) transformers become popular.
[1] https://news.ycombinator.com/item?id=30309302