Hacker News new | ask | show | jobs
by colah3 392 days ago
See https://transformer-circuits.pub/2022/toy_model/index.html#m...

If you're new to this, I'd mostly just look at all the empirical examples.

The slightly harder thing is to consider the fact that neural networks are made of linear functions with non-linearities between them, and to try to think about when linear directions will be computationally natural as a result.