| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by gwern 1631 days ago
	The big caveat here is that the inner monologue papers generally work with GPT-3-175b, LaMDA, or Gopher, all of which are much bigger than 20b, and they generally show phase transitions (https://old.reddit.com/r/mlscaling/comments/sjzvl0/d_instanc...) in the monologue capability: below a critical size, inner monologue doesn't work at all, performing worse than baseline even, no matter how they scale, and only past the critical size does inner monologue suddenly start working much better. So it's possible (has anyone checked?) that GPT-NeoX-20b just isn't large enough to do inner monologue.

1 comments

mlb_hn 1631 days ago

yeah, that's a very big caveat - haven't checked neo 20b yet. I've had a hard time getting the AI21 models to use it and those are also pretty big so it's interesting why sometimes it works and sometimes it doesn't. (and Davinci > Codegen Davinci > Curie > J-6B). Fine tunes can also learn to do the inner monologue as well which is really cool - not sure how much is architecture vs. training parameters.

link