|
I fiddled with this prompt Calculate 721*436
Step A: Multiplication table.
721*0=0 721*1=721 721*2=1,442 721*3=2,163
721*4=2,884 721*5=3,605 721*6=4,326
721*7=5,047 721*8=5,768 721*9=6,489
Step B: Multiply digits
436=4 hundreds + 3 tens + 6 ones
721*4 hundreds=2,884*100=288,400
721*3 tens=2,163*10=21,630
721*6 ones=4,326*1=4,326
Step C: Add the first two
288,400+21,630=310,030
Then add the last one
310,030+4,326=314356
To get GPT-3's answer to this one correct (this is its answer, but it took some trial and error with the prompt to get it to stop making small mistakes) Calculate 312*481
Step A: Multiplication table.
312*0=0 312*1=312 312*2=624 312*3=936
312*4=1,248 312*5=1,560 312*6=1,872
312*7=2,184 312*8=2,496 312*9=2,808
Step B: Multiply digits
481=4 hundreds + 8 tens + 1 ones
312*4 hundreds=1,248*100=124,800
312*8 tens=2,496*10=24,960
312*1 ones=312*1=312
Step C: Add the first two
124,800+24,960=149,760
Then add the last one
149,760+312=150,072
I then tested it with two random 3-digit numbers Calculate 669*306
Step A: Multiplication table.
669*0=0 669*1=669 669*2=1,338 669*3=2,007
669*4=2,676 669*5=3,345 669*6=4,014
669*7=4,683 669*8=5,352 669*9=6,021
Step B: Multiply digits
306=3 hundreds + 0 tens + 6 ones
669*3 hundreds=2,007*100=200,700
669*0 tens=0*10=0
669*6 ones=4,014*1=4,014
Step C: Add the first two
200,700+4,014=204,714
Then add the last one
204,714+306=205,020
204,714 is correct, but then it gets confused in the very last line because of the zero.Anyway this, I think, proves that GPT-3 can do zero-shot math if it doesn't have to do it in its head all at once. Is it good for a calculator? No. Is it good for a language model trained on web text? Yes, absolutely. E: Correcting the start of Step C in that last example to “200,700+0”, and replacing “XYZ=X hundreds + Y tens + Z ones” with “XYZ=X Y Z=X hundreds + Y tens + Z ones” allowed it to do 145*585, 961*761 and 592*555 correctly in a row, all randomly chosen, and at least the last two tried without changes to the prompt. I consider this an adequate test, and it demonstrates GPT-3's algorithm following abilities. As GPT-3 is still a tiny model, this seems important to note. E2: To be clear this is still nowhere near 100% successful. GPT-3 still makes a lot of errors. I ran 100 tries of a slightly different prompt through the API, and got a success rate of 42%. |
Very interesting! This is what I would expect. It can run a symbolic algorithm fine, just give it some scratch space to work out the intermediate results. I feel like there's a very large space to optimize the layout "algorithm" -- like how you adjusted step c -- to produce reliable results.