Pythran as a bridge between fast prototyping and code deployment

Y	Hacker News new \| ask \| show \| jobs

	Pythran as a bridge between fast prototyping and code deployment (serge-sans-paille.github.io)
	108 points by serge-ss-paille 2764 days ago

5 comments

rademacher 2764 days ago

Looks pretty nice. Unfortunately, I never run into this issue because I only get to write research code.

The Julia language was designed to target the two language problem and at least from these benchmarks it looks pretty competitive [1]. I imagine over time, pythran may fix some limitations and beat Julia in most benchmarks.

[1] https://github.com/fluiddyn/BenchmarksPythonJuliaAndCo/tree/...

link

mark_l_watson 2764 days ago

Thanks for that benchmark link - puts the claims of the article in perspective.

That said, I want to try Pythran to see how it works for one of my at-home side projects.

I had to learn a bit of Julia about 20 months ago - an old customer got in a pinch when somewhen left before a deliverable so I was immersed using Julia for two weeks. At first I liked the idea of Julia but I didn’t fall in love with the language.

link

rademacher 2764 days ago

I used to only use MATLAB, which for a lot of research code applications is actually nice for getting a prototype running quickly. Now that I have the freedom to choose, I typically use Julia as I'm trying to gain skills in opensource languages that are actually valuable in the job market. The choice of Julia over python is probably due to my nature of going against the grain, which flies in the face of my previous point.

Was there anything specific that turned you off from Julia?

link

svantana 2764 days ago

One major issue with Julia is that it only recently reached 1.0, with a lot of breaking changes, that make a lot of libraries incompatible.

Another issue is that it's not always that fast, for a recent project I never managed to exceed 100 MFLOPS, at which point I switched to C++ and got 3 GFLOPS. But the python version stalled out at 4 MFLOPS though...

link

cultus 2764 days ago

Did you write type-unstable code? That brings Julia performance down and memory usage up, since things often get inferred to be Any. So, a vector of what you think are doubles could be turned into a boxed vector of Any. It will slow things down to the speed of Python. Fortunately, it's usually pretty easy to avoid this if you are aware of it.

Well-written Julia should always be within a factor of 2-3 of C, often less. Huge problems are done in pure Julia now. Pure Julia code has been run on HPCs to over a petaflop, something that only C/C++ and Fortran have done. 100 MFLOP is not a problem.

link

rademacher 2764 days ago

Here is a reference for the comment above and a brief excerpt,

"Written in the productivity language Julia, the Celeste project—which aims to catalogue all of the telescope data for the stars and galaxies in in the visible universe—demonstrated the first Julia application to exceed 1 PF/s of double-precision floating-point performance (specifically 1.54 PF/s)." [1]

[1] https://www.nextplatform.com/2017/11/28/julia-language-deliv...

link

stabbles 2764 days ago

Very roughly without knowing details: probably the Julia implementation can be tuned to get close to 3 GFLOPS; it's not that the language has limitations to get above this 100MFLOPS whereas in Python 4MFLOPS might potentially be the best you can get.

Care to share your code and see if it can be improved upon?

link

svantana 2763 days ago

I think the main performance bottleneck is that I'm adding to a submatrix. Which seems to be a big performance hit in basically all high level languages.

link

tomp 2764 days ago

Note that the blog post is also about deployment, not just about performance. Does Julia support statically compiled executables without dependencies or GC?

link

keldaris 2764 days ago

Unfortunately, not really. There is some community work in that regard [1], but it doesn't seem to get as much attention as one would like. Some people have gotten it to work, but official support (guaranteeing maintenance and decent documentation) for static compilation and easy deployment would make a huge difference.

[1] https://github.com/JuliaLang/PackageCompiler.jl

link

byt143 2764 days ago

Better support for static compilation Is on the roadmap

link

yahyaheee 2764 days ago

Was also going to mention Julia, it aims to solve this problem and has a really great community

link

zedr 2764 days ago

Isn't this similar to the Nuitka project?

http://nuitka.net/

Nuitka has fantastic Python 3 support (up to 3.7 currently).

link

dagw 2764 days ago

They're similar in concept, but very different in focus. Nuitka main goal is to be 100% compatible with cpython, something which will often mean sacrificing performance compared to pythran.

Pythran main aim is to be fast, and to achieve this they're willing to only support a small subset of python.

As Nuitka's performance gets better and Pythran starts to support more and more of python, perhaps they'll converge at some point in the future.

link

welder 2764 days ago

Nuitka produces binaries, Pythran can produce C++ source code.

link

fredsanford 2764 days ago

Does Pythran work with things like opencv and sklearn as python modules or does code have to be written to explicitly enable them?

It feels to me like Pythran + opencv would be a killer combination since it can take 300+ lines of C++ to achieve what you can with 40ish lines of numpy, opencv and python.

link

ktta 2764 days ago

Was Cython given a consideration for this project?

I see that you are involved with the Pythran project, so could you tell us the shortcomings of Cython? As I understand it, before Pythran didn't support Python 3, but seems like that has changed

link

ktta 2764 days ago

Can't edit on my app, but the first question was for the blog post writer and my second to the submitter

link

serge-ss-paille 2764 days ago

> could you tell us the shortcomings of Cython

In order to achieve top performance, in the context of numerical simulations, you generally end up explicity writing the loops are implicit in high-level numpy (less abstraction).

Cython does not perform any high-level optimisation on the code, while Pythran does. For instance Pytrhan computes whether an array index may be negative or not, and generates wraparound only when needed. On the otherhand Cython requires a compiler directive to do so.

That being said, Cython can do plenty of stuff Pythran cannot: import native libraries, wrap classes, mixed Python/native mode etc. It has a much stronger codebase (more tested/validated) and a larger community.

link

jeanl 2764 days ago

For me, the biggest shortcoming: Cython does not create independent C++ code (independent of the python interpreter that is) that can be used in a separate C++ code base. My main point is that pythran makes it possible to deploy python/numpy code as C++ code.

link

ktta 2756 days ago

I didn't realize you're the author of the blog post!

Thanks to both of you for the reply

link

kristofferc 2764 days ago

I feel that a comparison to the handwritten C++ version would make the claims a lot stronger. Making something 10x faster is not very hard if it is incredibly slow to begin with and is, on its own, fairly uninteresting. On the other hand, if the results here approcahed the speed of optimized C++ code, then this workflow makes a lot of sense.

link

SubiculumCode 2763 days ago

Seems there could be a cost/benefit analysis here. Ten times faster than python might be sufficient for some applications given the potential for much faster deployment, regardless of whether handwritten C might be faster.

link

jeanl 2763 days ago

You're absolutely right: you don't necessarily need to be within 10% of pure C++ if algo development is made far easier by using python/numpy. But it would be good to have a hand-written C++ baseline to determine where the cost/benefit point is (at least for this example).

link

jeanl 2764 days ago

That's a fair point and I have to admit I didn't have the time/courage to port the entire algo to C++. But I can see why that would make the results a lot more convincing.

link