Hacker News new | ask | show | jobs
by whimsicalism 755 days ago
neural probing has been around for a while, true - and this result is definitely building on past results. it’s basically just a scaled up version of their paper from a little while ago anywho

but Karpathy was looking at very simple LSTMs of 1-3 layers, looking at individual nodes/cells, and these results have generally thus far been difficult to replicate among large scale transformers. Karpathy also doesn’t provide a recipe for doing this in his paper, which makes me think he was just guess and checking various cells. The representations discovered are very simple