Hacker News new | ask | show | jobs
by slack3r 2682 days ago
It may be black magic now, but people are working on making the black magic something lesser mortals don't have to worry about (eg: https://theory.stanford.edu/~aiken/publications/papers/asplo...)

Unfortunately, no one is actually paying attention to all this research being done in ivory tower academics. Sad!

1 comments

Superoptimization is slow, and it's generally considered feasible only for shortish bits of code. A block of about 30 instructions takes around 10 minutes to superoptimize even on state-of-the-art stochastic superoptimizers.

You also need correct semantics for instructions. Go read the PCMPISTRI instruction and try to write out the SMT formulas for it so you can prove equivalence. Floating point is in an even worse state, since most interesting floating point transformations are only legal if you're willing to slacken requirements for bit-equivalence.

Superoptimization is not mature enough to become part of the regular compiler flow. What is happening is you're seeing people couple superoptimizers to find missing optimizations in the compiler--John Regehr is basically doing this for LLVM's InstCombine pass with Souper.

Lol, my comment was restricted to SIMD, which definitely does not take 10 minutes.

Secondly, no mainstream compiler actually compiles code to the PCMPISTRI instruction. Presumably it was meant to be used directly as assembly. I'm not sure why you are bringing in this obscure instruction into the discussion of superoptimizers.

I personally have a paranoid fantasy where NSA/GCHQ introduced this instruction to speed up password cracking. :D

You're citing the classic superoptimization paper, although it's actually quite old at this point. The goal of superoptimization has always been to try to compile to the no-real-C-equivalent instructions in an ISA, which includes the weird vector instructions such as PCMPISTRI. (I believe Strata attempted to match some of these instructions, and I think they managed to get formulas for about half of the weird immediate forms--these are the instructions that caused them to match "half" an instruction).

In any practical vectorized tight inner-loop, the block you're trying to optimize is inherently going to be large. Superoptimization is exponential in the size of the block being optimized, which limits its utility. That was my entire point: it becomes unacceptably expensive way too quickly to get used in compilers. (Some of the code I'm looking at right now has 100s of instructions in a single basic block, definitely not atypical for a compiler).

Don't be ridiculous, dude. No compiler is actually compiling to PCMPISTRI. It's meant to be used directly as assembly.

Of course I could be wrong. If you do know of any open-source compiler which compiles code to PCMPISTRI, let me know.

Superoptimization is not exactly exponential, it is NP-hard.

Of course, in the paranoid conspiracy I was referring to in my previous comment I predict the NSA/GCHQ is also using this directly via assembly. xD