Hacker News new | ask | show | jobs
by dangond 1150 days ago
I know this discussion is a bit old at this point, but I came across this[1] essay for the first time today, and this shows more of what I was trying to get across earlier in the thread. Hopefully you'll find it interesting. Essentially, they trained a GPT on predicting the next move in a game of Othello, and by analyzing the weights of the network, found that the weights encode an understanding of the game state. Specifically, given an input list of moves, it calculates the positions of its own pieces and that of the opponent (a tricky task for a NN given that Othello pieces can swap sides based on moves made on the other side of the board). Doing this allowed it to minimize loss. By analogy, it formed a theory about what makes moves legal in Othello (in this case, the positions of each player's pieces), and found out how to calculate those in order to better predict the next move.

[1] https://www.neelnanda.io/mechanistic-interpretability/othell...