Hacker News new | ask | show | jobs
by danparsonson 503 days ago
Is it reasonable to imagine that LLMs should be able to play chess? I feel like we're expending a whole lot of effort trying to distort a screwdriver until it looks like a spanner, and then wondering why it won't grip bolts very well.

Why should a language model be good at chess or similar numerical/analytical tasks?

In what way does language resemble chess?

4 comments

I think because LLMs are convincingly good at natural language tasks, humans tend to anthropomorphize them. Due to this, it often is assumed that they are good at everything humans are capable of.
Okay, but I'm not good at playing chess without seeing the chessboard. In fact I'm pretty awful at that.
It might be a reasonable ask for an LLM to 'remember' the endgame tablebase of solved games - which is less than a GB for all game with five or less pieces on the board. This puzzle specifically relies on this knowledge and the knowledge of how the chess pieces move.
LLM: given a sequence of words, what is the most likely next word

Chess engine: given a sequence of moves in a winning game, what is the most likely next move

I don't think LLMs will ever beat purpose built engines, but it is not inconcevable for them to play better chess than most humans.

Yeah, I don't think they are a useful measuring stick for LLMs.

My amateur opinion is that an "AI system" resembling AGI or ASI or whatever the acronym of the day is, will be modular, with different parts addressing different kinds of learning, rather than entirely end to end. One of the main milestones towards achieving this would be the ability to dynamically learn what is left to be learnt (finding gaps), and then potentially have it train itself to learn that, automatically. One of the half-milestones, I suppose, would be for humans to find gaps in the the ability first of all.

I attend a talk recently where they presented research that tried to distinguish effectively the following two types of LLM failures:

1) inability to generalize/give the output at the "representation layer" itself

2) has the information represented, but is not able to retrieve it for the given reasonable prompt, and requires "context scaling"

Which is a step towards this goal I suppose.