Hacker News new | ask | show | jobs
by eiz 1162 days ago
> Where is the connection between computational details and the model's high-level behavior? Do we even know?

This is an active area of study ("mechanistic interpretability") and it's very early days. For instance here's a paper I read recently that tries to explain how a very simple transformer learns how to do modular arithmetic: https://arxiv.org/abs/2301.05217

Curious what interesting results people are aware of in this area.