Hacker News new | ask | show | jobs
by wongarsu 1132 days ago
It also lags one iteration behind. Which is a problem because a misaligned model might lie to you, spoiling all future research with this method
1 comments

It doesn't have to lag, though. You could ask gpt-2 to explain gpt-2. The weights are just input data. The reason this wasn't done on gpt-3 or gpt-4 is just because a) they're much bigger, and b) they're deeper, so the roles of individual neurons are more attenuated.