Hacker News new | ask | show | jobs
by gwern 1584 days ago
The big caveat here is that the inner monologue papers generally work with GPT-3-175b, LaMDA, or Gopher, all of which are much bigger than 20b, and they generally show phase transitions (https://old.reddit.com/r/mlscaling/comments/sjzvl0/d_instanc...) in the monologue capability: below a critical size, inner monologue doesn't work at all, performing worse than baseline even, no matter how they scale, and only past the critical size does inner monologue suddenly start working much better. So it's possible (has anyone checked?) that GPT-NeoX-20b just isn't large enough to do inner monologue.
1 comments

yeah, that's a very big caveat - haven't checked neo 20b yet. I've had a hard time getting the AI21 models to use it and those are also pretty big so it's interesting why sometimes it works and sometimes it doesn't. (and Davinci > Codegen Davinci > Curie > J-6B). Fine tunes can also learn to do the inner monologue as well which is really cool - not sure how much is architecture vs. training parameters.