Hacker News new | ask | show | jobs
by jwilber 751 days ago
My point is that you claimed this is a rebuff against those claiming models don’t understand themselves. Your interpretation seems to assign intelligence to the algorithms.

While this research allows us to interpret larger models in an amazing way, it doesn’t mean the models themselves ‘understand’ anything.

You can use this on much smaller scale models as well, as they showed 8 months ago. Does that research tell us about how models understand themselves? Or does it help us understand how the models work?

1 comments

"Understand themselves" is a very different thing than "understand what they are saying."

Which exactly are we talking about here?

Because no, the research doesn't say much about the former, but yes, it says a lot about the latter, especially on top of the many, many earlier papers working in smaller toy models demonstrating world modeling.