|
|
|
|
|
by ADeerAppeared
708 days ago
|
|
I explicitly say that LLMs could do it in my response. As a show of good faith you should try reading the entire comment. Yes, I'm using simple examples to demonstrate a particular difference, because using "real" examples makes getting the point across a lot harder. You're also just wrong. I did in fact test, and both GPT 3.5 Turbo and 4o failed. Not only with the rule change, but with the mere task of providing possible moves. I only included the admission that they may succeed as a matter of due diligence, in that I cannot conclusively rule out they can't get the right answer because of the randomization and API-specific pre-prompting involved. > "For chess board r1bk3r/p2pBpNp/n4n2/1p1NP2P/6P1/3P4/P1P1K3/q5b1 (FEN notation), what are the available moves for pawn B5" |
|
The argument you are making is based on the fact that the example is simple. If the example were not simple, you would not be able to use it to dismiss LLMs.
I am not surprised that GPT 3.5 and 4o failed, they are both terrible models. GPT4-o is multimodal, but it is far buggier than gpt-4. I tried with claude 3.5 sonnet and it got it first try. It also was able to compute the moves when told the rule change.