Looks pretty nice. Unfortunately, I never run into this issue because I only get to write research code.
The Julia language was designed to target the two language problem and at least from these benchmarks it looks pretty competitive [1]. I imagine over time, pythran may fix some limitations and beat Julia in most benchmarks.
Thanks for that benchmark link - puts the claims of the article in perspective.
That said, I want to try Pythran to see how it works for one of my at-home side projects.
I had to learn a bit of Julia about 20 months ago - an old customer got in a pinch when somewhen left before a deliverable so I was immersed using Julia for two weeks. At first I liked the idea of Julia but I didn’t fall in love with the language.
I used to only use MATLAB, which for a lot of research code applications is actually nice for getting a prototype running quickly. Now that I have the freedom to choose, I typically use Julia as I'm trying to gain skills in opensource languages that are actually valuable in the job market. The choice of Julia over python is probably due to my nature of going against the grain, which flies in the face of my previous point.
Was there anything specific that turned you off from Julia?
One major issue with Julia is that it only recently reached 1.0, with a lot of breaking changes, that make a lot of libraries incompatible.
Another issue is that it's not always that fast, for a recent project I never managed to exceed 100 MFLOPS, at which point I switched to C++ and got 3 GFLOPS. But the python version stalled out at 4 MFLOPS though...
Did you write type-unstable code? That brings Julia performance down and memory usage up, since things often get inferred to be Any. So, a vector of what you think are doubles could be turned into a boxed vector of Any. It will slow things down to the speed of Python. Fortunately, it's usually pretty easy to avoid this if you are aware of it.
Well-written Julia should always be within a factor of 2-3 of C, often less. Huge problems are done in pure Julia now. Pure Julia code has been run on HPCs to over a petaflop, something that only C/C++ and Fortran have done. 100 MFLOP is not a problem.
Here is a reference for the comment above and a brief excerpt,
"Written in the productivity language Julia, the Celeste project—which aims to catalogue all of the telescope data for the stars and galaxies in in the visible universe—demonstrated the first Julia application to exceed 1 PF/s of double-precision floating-point performance (specifically 1.54 PF/s)." [1]
Very roughly without knowing details: probably the Julia implementation can be tuned to get close to 3 GFLOPS; it's not that the language has limitations to get above this 100MFLOPS whereas in Python 4MFLOPS might potentially be the best you can get.
Care to share your code and see if it can be improved upon?
I think the main performance bottleneck is that I'm adding to a submatrix. Which seems to be a big performance hit in basically all high level languages.
Note that the blog post is also about deployment, not just about performance. Does Julia support statically compiled executables without dependencies or GC?
Unfortunately, not really. There is some community work in that regard [1], but it doesn't seem to get as much attention as one would like. Some people have gotten it to work, but official support (guaranteeing maintenance and decent documentation) for static compilation and easy deployment would make a huge difference.
They're similar in concept, but very different in focus. Nuitka main goal is to be 100% compatible with cpython, something which will often mean sacrificing performance compared to pythran.
Pythran main aim is to be fast, and to achieve this they're willing to only support a small subset of python.
As Nuitka's performance gets better and Pythran starts to support more and more of python, perhaps they'll converge at some point in the future.
Does Pythran work with things like opencv and sklearn as python modules or does code have to be written to explicitly enable them?
It feels to me like Pythran + opencv would be a killer combination since it can take 300+ lines of C++ to achieve what you can with 40ish lines of numpy, opencv and python.
Was Cython given a consideration for this project?
I see that you are involved with the Pythran project, so could you tell us the shortcomings of Cython? As I understand it, before Pythran didn't support Python 3, but seems like that has changed
In order to achieve top performance, in the context of numerical simulations, you generally end up explicity writing the loops are implicit in high-level numpy (less abstraction).
Cython does not perform any high-level optimisation on the code, while Pythran does. For instance Pytrhan computes whether an array index may be negative or not, and generates wraparound only when needed. On the otherhand Cython requires a compiler directive to do so.
That being said, Cython can do plenty of stuff Pythran cannot: import native libraries, wrap classes, mixed Python/native mode etc. It has a much stronger codebase (more tested/validated) and a larger community.
For me, the biggest shortcoming: Cython does not create independent C++ code (independent of the python interpreter that is) that can be used in a separate C++ code base. My main point is that pythran makes it possible to deploy python/numpy code as C++ code.
I feel that a comparison to the handwritten C++ version would make the claims a lot stronger. Making something 10x faster is not very hard if it is incredibly slow to begin with and is, on its own, fairly uninteresting. On the other hand, if the results here approcahed the speed of optimized C++ code, then this workflow makes a lot of sense.
Seems there could be a cost/benefit analysis here.
Ten times faster than python might be sufficient for some applications given the potential for much faster deployment, regardless of whether handwritten C might be faster.
You're absolutely right: you don't necessarily need to be within 10% of pure C++ if algo development is made far easier by using python/numpy. But it would be good to have a hand-written C++ baseline to determine where the cost/benefit point is (at least for this example).
That's a fair point and I have to admit I didn't have the time/courage to port the entire algo to C++. But I can see why that would make the results a lot more convincing.
The Julia language was designed to target the two language problem and at least from these benchmarks it looks pretty competitive [1]. I imagine over time, pythran may fix some limitations and beat Julia in most benchmarks.
[1] https://github.com/fluiddyn/BenchmarksPythonJuliaAndCo/tree/...