|
|
|
|
|
by andreyk
234 days ago
|
|
But LLMs would presumably also condition on past observations of opponents - i.e. LLMs can conversely adapt their strategy during repeated play (especially if given a budget for reasoning as opposed to direct sampling from their output distributions). The rules state the LLMs do get "Notes hero has written about other players in past hands" and "Models have a maximum token limit for reasoning" , so the outcome might be at least more interesting as a result. The top models on the leaderboard are notably also the ones strongest in reasoning. They even show the models' notes, e.g. Grok on Claude: "About: claude
Called preflop open and flop bet in multiway pot but folded to turn donk bet after checking, suggesting a passive postflop style that folds to aggression on later streets." PS The sampling params also matter a lot (with temperature 0 the LLMs are going to be very consistent, going higher they could get more 'creative'). PPS the models getting statistics about other models' behavior seems kind of like cheating, they rely on it heavily, e.g. 'I flopped middle pair (tens) on a paired board (9s-Th-9d) against LLAMA, a loose passive player (64.5% VPIP, only 29.5% PFR)' |
|