I would imagine Nuitka is slower, because it cannot apply dynamic optimisations which are the key technique for making a language like Python fast, but more predictable and regular as the dynamic optimisations that PyPy uses are vulnerable to performance cliffs.
Are there any profile-guided optimisation methods for Python? Like, something that saves information from PyPy's dynamic optimisations on first run and uses them later for an AOT approach?
Yes, compare to LuaJIT. On a good day, it can bite C's shiny metal ass, but consistent and reliable application performance requires intimate knowledge of those performance cliffs.
Can it actually? I've certainly seen benchmark's where a JITed program is comparable/marginally faster than a C program, but I've never seen one where it outright trounces C.
Yes. A jit compiler can compile based on both code and data, and can compile all the dependencies. In contrast an aot compiler must only use the code and links to other pre-compiled binaries.
Further, a jit can compile to specific hardware. An aot must anticipate what hardware is possible. If you aot compile to several possible hardware configs, then you'll bloat the binary.
Which is great in theory, but what was intending to ask was are there any actual examples of JITed programs which beat C by a large margin? Because, as I say, I've only ever seen ones which are comparable to C.
Trouble is, you don't tend to write a big program in two languages the exact same way. The rewrite nearly always wins because of algorithm improvements and cleaning cruft. So it's hard to benchmark.
If two small programs are about the same speed in C vs JITlang, then the theory says the JITted one should take over as the program gets bigger.