My understanding is that they encoded domain in several dozens of mechanical rules described in extended data table 1, and then did transformer-guided brute force search for solutions.
The problem is that LLM as a role for drawing auxiliary lines is too inefficient. It is hard to imagine people deploying a large number of machines to solve a simple IMO problem. This field must be in the early stage of development, and much work remains unfinished