| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by mlb_hn 1630 days ago
	I get the tokenization argument and it may influence it a bit, but I suspect the n-digit math issue has to do more with search the way it samples (in the bpe link gwern references some experiements I'd done with improving n-digit math by chunking using commas, http://gptprompts.wikidot.com/logic:math). I think since it samples left to right on the first pass, it's not able to predict well if things carry from right to left. I think can mitigate the search issue a bit if you have the prompt double-check itself after the fact (e.g. https://towardsdatascience.com/1-1-3-wait-no-1-1-2-how-to-ha...). Works different depending on the size of the model tho.

2 comments

moyix 1630 days ago

Yup, quite possible that this has something to do with it. There is other work showing that giving LMs a "scratchpad" for intermediate computations allows them to do much better not just at arithmetic but also things like predicting the output of some code: https://arxiv.org/abs/2112.00114

link

mlb_hn 1630 days ago

definitely. also works on text translation/comprehension like emojis! https://aidungeon.medium.com/introducing-ai-dungeon-translat.... For actual benchmarks, scratchpad improves GPT-Davinci WIC from 50% accuracy (chance) to nearly 70%.

I think the check and validate is a different sort of scratchpad but maybe not. Seems like at least 3 types - soe for pulling implicit info out of the network viz wic, sometimes for intermediary steps viz coding, sometimes for verification like here.

link

gwern 1630 days ago

The big caveat here is that the inner monologue papers generally work with GPT-3-175b, LaMDA, or Gopher, all of which are much bigger than 20b, and they generally show phase transitions (https://old.reddit.com/r/mlscaling/comments/sjzvl0/d_instanc...) in the monologue capability: below a critical size, inner monologue doesn't work at all, performing worse than baseline even, no matter how they scale, and only past the critical size does inner monologue suddenly start working much better. So it's possible (has anyone checked?) that GPT-NeoX-20b just isn't large enough to do inner monologue.

link

mlb_hn 1630 days ago

yeah, that's a very big caveat - haven't checked neo 20b yet. I've had a hard time getting the AI21 models to use it and those are also pretty big so it's interesting why sometimes it works and sometimes it doesn't. (and Davinci > Codegen Davinci > Curie > J-6B). Fine tunes can also learn to do the inner monologue as well which is really cool - not sure how much is architecture vs. training parameters.

link

ravi-delia 1630 days ago

I feel like attention would largely mitigate that, no? Has anyone looked at what the weights are while doing addition?

link