One would assume there are some low-hanging fruits that make up the bulk of the speed-ups, but maybe it’s really a huge pile of small incremental improvements?
My impression is that going from a very simple case like this to a moderately fast solver involves bringing in a fair number of intuitive improvements like memoization of anti-patterns and heuristic reordering of the search.
But getting a really fast solver requires multiple strategies that don't work well together (so they have to work semi-independently). This leads to other problems related to managing shared resources like how much you should let different strategies cache information at the expense of how much other strategies can cache information. Tuning all of these trade-offs is exceedingly difficult to do well since it depends a lot on the types of problems that you need to solve.
But getting a really fast solver requires multiple strategies that don't work well together (so they have to work semi-independently). This leads to other problems related to managing shared resources like how much you should let different strategies cache information at the expense of how much other strategies can cache information. Tuning all of these trade-offs is exceedingly difficult to do well since it depends a lot on the types of problems that you need to solve.