Hacker News new | ask | show | jobs
by IIAOPSW 1183 days ago
The answer is that for Chess it doesn't matter. The standard chess piece notation is a complete encoding of the game space. An inference about the nature of our board based physical understanding is not needed. You could formulate chess as a purely text based game about appending alphanumeric tokens to a chain of said tokens. Its a closed system. The machine need not be tied to our squares and horsey based interpretation of the semantics. To be able to follow the grammar of the language chess is to understand chess.

In a similar vein, it is almost possible to adjudicate Diplomacy orders looking only at the orders and never the map.

Given sufficient interest, complex enough board games tend to converge on the same basic notational principles.

2 comments

The internal model will certainly pick up on statistical correlations among the text analysis corresponding to an 8x8 2D grid as this is the most low-hanging statistical representation that helps solving the problem during training.

The same argument and result exist for the different human sensory modalities - neurons and connections self-organize to have the same topology and layout as the retina (2D) and frequency / time for the audio (also 2D).

In fact, wasn't this experiment already done for Othello and LLMs recently? Wasn't there a paper where they found the internal model for the board?

It can learn the rules for movement strictly as generator rules imposed on a string of tokens representing the previous sequence of moves. Each new item appended to the list has to in some way match a previous item in the list. Eg RC6 is a Rook, so it has to match an earlier token that is also a Rook, in one of two ways: R_6 or RC_ (and it must not be previously captured by __6 or _C_ ). At no point is it even necessary to convert the move history into a present board state, let alone the state of an 8x8 grid. The move history is sufficient board state on its own. Are the rules for valid chess moves, expressed as 3 character token grammar, the same thing as having learned a 2d grid in latent space? I don't think so, because the language rule is more general and isn't restricted by geometry.

In principle it could reason about any incidence structure. That is, anything where the semantics is two types of objects, and a "touching" relation between them. Lines are just all the points along them, points are just all the lines intersecting there. For the purpose of directions, a train station is just a list of all the services that go there, and a service is just the list of stations where it stops. Etc etc. A language model is free to learn and understand these sorts of systems purely as relations on symbols without ever implicitly organizing it into a geometrical representation.

This is all good news. It means Chess, Transit, Diplomacy, and many other things can fit nicely into pure language reasoning without trying to ground the language in the semantics of our physical nature with its dimensions of space and time and whatever.

What would change my mind is if, after learning the rules for Chess as string matching, it invented a word for "row" and "column" on its own.

That paper is at the link containing "othello" upstream.
>> To be able to follow the grammar of the language chess is to understand chess.

That's interesting. I think you're saying that the rules of chess can be described as a transformation system [1] over the set of strings of chess algebraic notation?

_________

[1] A system of transformation rules, as in transformation grammars. Not as in Transformers. Just a coincidence.

Well, let me try to explain what I'm thinking, though I may have misunderstood you.

The rules of chess allows you to enumerate all the possible transformations from one board state to the next. This is just a fancy way of saying all possible moves in any given board state. By induction this means that given an initial board state and a series of moves from that board state you can determine the final board state.

So this means that the rules of chess allow you to enumerate given an initial state and n plies, all possible ways of adding an (n + 1)th ply.

So if you just assume the initial board state is always the starting position, theoretically you could do away with thinking about board states altogether. Now, whether that's sensible in terms of computational complexity is another question entirely and my intuition is no.

>> So this means that the rules of chess allow you to enumerate given an initial state and n plies, all possible ways of adding an (n + 1)th ply.

Ah, I get you. Yeah, OK, makes sense. You can generate all legal moves from a starting board state given the rules of chess. Yep.

Yeah thats exactly it. The rules are easy enough to put into the form of matching strings. I gave an explicit example further down in the thread. At no point is it required to even convert the game history into a board state data structure. The game history in standard notation itself is sufficient as a game state. To know where a piece is, simply iterate back from the end until finding the last mention of it then iterate forward to make sure it wasn't captured.