Post-flop on the other hand is all over the place...
in fact, a fun project would be take a non-reasoning model, play on a lesser known game format, and see if it learns an "a ha" moment or explicitly simulate moves ahead
in fact, a fun project would be take a non-reasoning model, play on a lesser known game format, and see if it learns an "a ha" moment or explicitly simulate moves ahead