Hacker News new | ask | show | jobs
by bobsomers 453 days ago
In your astronomer example, what makes you attribute this to “planning” or look ahead rather than simply a learned statistical artifact of the training data?

For example, suppose English had a specific exception such that astronomer is always to be preceded by “a” rather than “an”. The model would learn this simply by observing that contexts describing astronomers are more likely to contain “a” rather than “an” as a next likely character, no?

I suppose you can argue that at the end of the day, it doesn’t matter if I learn an explicit probability distribution for every next word given some context, or whether I learn some encoding of rules. But I certainly feel like the prior is what we’re doing today (and why these models are so huge), rather than learning higher level rule encodings which would allow for significant compression and efficiency gains.

2 comments

Thanks for the great questions! I've been responding to this thread for the last few hours and I'm about to need to run, so I hope you'll forgive me redirecting you to some of the other answers I've given.

On whether the model is looking ahead, please see this comment which discusses the fact that there's both behavioral evidence, and also (more crucially) direct mechanistic evidence -- we can literally make an attribution graph and see an astronomer feature trigger "an"!

https://news.ycombinator.com/item?id=43497010

And also this comment, also on the mechanism underlying the model saying "an":

https://news.ycombinator.com/item?id=43499671

On the question of whether this constitutes planning, please see this other question, which links it to the more sophisticated "poetry planning" example from our paper:

https://news.ycombinator.com/item?id=43497760

Let's note that the label you assign this feature is entirely speculative, i.e. it is your interpretation, not something the model actually "knows".
> In your astronomer example, what makes you attribute this to “planning” or look ahead rather than simply a learned statistical artifact of the training data?

What makes you think that "planning", even in humans, is more than a learned statistical artifact of the training data? What about learned statistical artifacts of the training data causes planning to be excluded?