|
|
|
|
|
by davidad_
4492 days ago
|
|
Here [1] is the comparison repository, for anyone interested. [1] https://github.com/davidad/8queens/tree/%2Bc_comparison EDIT: I don't seem to have -Ofast, but I added a note to the readme to ask people to try it. I also added the recommended flags for Sandy Bridge, i.e. -march=core2 -msse4.1 -msse4.2. |
|
Edit: Using C++ templates to inline the DFS also helps
Unfortunately this sort of technique doesn't really apply to the asm >_>Edit again: I tried to go a bit further by replacing the board struct with your method of passing 3 bytes between levels of the dfs, but at that point the compiler was able to optimize out the whole program.
Removing the templates causes the compiler to actually emit a dfs, which by now is faster than the asm (!). Removing the 2x speedup from just not looking at half of the tree slows us back down to ~1.1x the runtime of the asm:
Here's this last solution, the one that's ~1.1x slower than the asm on my machine: http://pastie.org/8784768And with the free 2x speedup: http://pastie.org/8784770
One last time now: I tried getting rid of the x arg and using a global. This was much slower, so I tried getting rid of the SOLUTIONS global and using a return value. This caused the compiler to optimize out the whole program again :(