| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danparsonson 549 days ago

Is it reasonable to imagine that LLMs should be able to play chess? I feel like we're expending a whole lot of effort trying to distort a screwdriver until it looks like a spanner, and then wondering why it won't grip bolts very well.

Why should a language model be good at chess or similar numerical/analytical tasks?

In what way does language resemble chess?

4 comments

ahoka 549 days ago

I think because LLMs are convincingly good at natural language tasks, humans tend to anthropomorphize them. Due to this, it often is assumed that they are good at everything humans are capable of.

link

Filligree 549 days ago

Okay, but I'm not good at playing chess without seeing the chessboard. In fact I'm pretty awful at that.

link

thepoet 549 days ago

It might be a reasonable ask for an LLM to 'remember' the endgame tablebase of solved games - which is less than a GB for all game with five or less pieces on the board. This puzzle specifically relies on this knowledge and the knowledge of how the chess pieces move.

link

GuB-42 548 days ago

LLM: given a sequence of words, what is the most likely next word

Chess engine: given a sequence of moves in a winning game, what is the most likely next move

I don't think LLMs will ever beat purpose built engines, but it is not inconcevable for them to play better chess than most humans.

link

porridgeraisin 549 days ago

Yeah, I don't think they are a useful measuring stick for LLMs.

My amateur opinion is that an "AI system" resembling AGI or ASI or whatever the acronym of the day is, will be modular, with different parts addressing different kinds of learning, rather than entirely end to end. One of the main milestones towards achieving this would be the ability to dynamically learn what is left to be learnt (finding gaps), and then potentially have it train itself to learn that, automatically. One of the half-milestones, I suppose, would be for humans to find gaps in the the ability first of all.

I attend a talk recently where they presented research that tried to distinguish effectively the following two types of LLM failures:

1) inability to generalize/give the output at the "representation layer" itself

2) has the information represented, but is not able to retrieve it for the given reasonable prompt, and requires "context scaling"

Which is a step towards this goal I suppose.

link