Pythran: Crossing the Python Frontier [pdf]

Y	Hacker News new \| ask \| show \| jobs

	Pythran: Crossing the Python Frontier [pdf] (computer.org)
	122 points by serge-ss-paille 2983 days ago

10 comments

eljost 2983 days ago

Interesting article but just skimming through it some things stand out immediately: 1.) The first snippet isn't even valid python code as floats don't have a shape attribute.

  s = 0. 
  n = s.shape

2.) The inline latex math isn't rendered properly.

link

kristofferc 2983 days ago

The first snippet also doesn't balance the parenthesis

    s += 100. * x[i + 1] - x[i] ** 2.) ** 2. + (1 - x[i]) ** 2

link

chestervonwinch 2983 days ago

It should be the shape of x (actually, the zero'th element of the shape), but this is also a tad odd because this would assume that x is a numpy array, which isn't introduced until after this 'naive pure python' code block (i.e., before numpy is even introduced in the text).

link

rlayton2 2983 days ago

I believe the input should be a numpy array of floats which has a shape attribute

link

serge-ss-paille 2983 days ago

(shameful author here)

One should read

  def rosen_explicit_loop(x): 
    s = 0. 
    n = x.shape[0] 
    for i in range(0, n - 1): 
      s += 100. * (x[i + 1] - x[i] ** 2.) ** 2. + (1 - x[i]) ** 2 
    return s

(edited)

link

chestervonwinch 2983 days ago

...

    n = x.shape[0]

link

quietbritishjim 2983 days ago

Or just len(x). This works perfectly well on numpy arrays and has the bonus that it works on regular lists/tuples of floats, so the first snippet doesn't rely on numpy.

link

accurrent 2983 days ago

Link to the actual software: https://github.com/serge-sans-paille/pythran

link

emmelaich 2983 days ago

> "Pythran is an ahead of time compiler for a subset of the Python language, with a focus on scientific computing."

and

> a claimless python to c++ converter

link

serge-ss-paille 2983 days ago

(main dev writting) Don't focus too much on the claimless :-) There still are far more wide spread tools that solve the same kind of issues (numba, cython, julia)...

The idea of the change was that it's more important to convey the problem it solves rather than how it's done ;-)

link

UncleEntity 2983 days ago

Nice project, I've wanted to do something similar using my libjit python bindings but never seem to work up the gumption.

Simple and Effective Type Check Removal through Lazy Basic Block Versioning[0] could be adapted to make it unnecessary to compile more than one version of the function at a (probably) minor cost of performance. It's geared more towards jitted code but something like it where it compiles different blocks where the types matter and falls back to interpreted code if users pass in some random types not expected just might work -- or the python interpreter throws an exception if the types don't make sense.

[0]http://drops.dagstuhl.de/opus/volltexte/2015/5219/pdf/9.pdf

link

targafarian 2982 days ago

Claims about Numba by the author seem a little unkind if not wrong to me.

Numba can handle vectorized (numpy behavior) directly in addition to explicit loops. The former is accelerated less in comparison to plain python calling numpy (since if you can use numpy operations directly, it's already really fast) but the numpy bits in Numba can also be automatically parallelized by Numba. Explicit loops in numba are accelerated hundreds of times over Python loops (and you can use e.g. prange to write parallel loops, too). Point is, the two paradigms can be mixed and matched at will within Numba.

It seems like every example is of cython, but then the author generalizes the conclusions to Numba as well. It would be much more "honest" to show side-by-side comparison of Numba, Cython, and Pythran, since these all have different syntaxes and are fairly different tools.

Another example is that you don't have to rewrite functions for different argument types in Numba, but you do in Cython (see "convolve_laplacian" example, which can work with a simple decorator as a numba function). There again, the impression is given that Numba suffers from the same issue as Cython (and as mentioned elsewhere in the comments here, it's possible that Cython has a way around this, but I don't know the details).

link

AnimalMuppet 2983 days ago

Off topic, but this seems like as good a place as any to ask: It's my impression that numpy is really good. Is it as good as Fortran? That is, if I have a large, sparse, complex matrix, Fortran will have an efficient solver for it that will also be numerically stable, and will have four decades of use to find any weaknesses. Is numpy equivalent (except for the four decades part)? Is it close? Or does it just cover the basic cases well, and for the specializations you're on your own?

link

zb 2983 days ago

NumPy is designed to work with SciPy, which is a wrapper for literally the same 4 decade old Fortran libraries (LAPACK &c.) that you're referring to here.

link

gnufx 2982 days ago

Note that LAPACK isn't four decades old (and probably should be superseded anyway). You perhaps won't want to use large-scale numerical libraries that are that old unless they've had a lot of development and been well parallelized.

One issue with mixed language programming that we learned decades ago is the issue of debugging that there typically is across the interface. Also general tool support. I recently asked the local Python expert about HPC-style profiling of Python calling C(++) libraries, for instance. (I couldn't make TAU work.)

link

geoalchimista 2983 days ago

> "As a matter of comparison, Cython does not support principles 1, 2, or 3 and has optional support for 4."

For 2 Type agnosticism, this can be emulated with a "fused type" in Cython. See this example: http://cython.readthedocs.io/en/latest/src/userguide/numpy_t...

But I think the major inconvenience with Cython vectorization is really not about `float32` and `float64`. You get `float64` NumPy array by default from floating-point calculations. The actual inconvenience is that the vectorized function cannot take a scalar input like the NumPy ones. To remain polymorphic, I usually have to perform an `is_scalar` check on the input in a Python wrapper before sending the input data to the Cython function.

link

targafarian 2982 days ago

Note that the alternative in Numba is incredibly convenient, where you can trivially create a function where the body of the function operates on a scalar, but then using the @vectorize decorator which makes it into a numpy ufunc automatically. This generalizes the function to operate equally well on scalars or numpy arrays of any dimensionality (just like "built-in" numpy functions do).

Oh, and if you use set target='gpu', your function also works on GPUs, too. Or you can use target='parallel' to make it parallelize automatically across CPU cores.

link

gnufx 2982 days ago

When Python was announced, the three main things I remember striking dynamic languages people (apart from using a bastardized offside rule) were weird scope rules, lack of proper GC, and that it appeared to be designed particularly to preclude efficient implementation. We've subsequently seen the huge amount of effort that's been devoted to different ways of working around the implementation issue.

link

fermigier 2983 days ago

Talk from PyParis 2015: https://www.youtube.com/watch?v=Af8B30mXZ7E

Slides: https://fr.slideshare.net/PoleSystematicParisRegion/track-32...

link

icebraining 2983 days ago

Also from this year's FOSDEM: https://fosdem.org/2018/schedule/event/pythran/

link

dilawar 2983 days ago

I wonder why the author did not compare performance of pypy? I guess pypy is jit compiler.

link

dec0dedab0de 2983 days ago

I was going to say that most scientific python libraries use numpy, but a quick google shows that pypy supports numpy now.

link

joshsyn 2983 days ago

Isn't this solved by julia? I think scientific community should use a more functional language rather than language like python tbh

link

poster123 2983 days ago

Or by modern Fortran, which has had array operations since the 1990 standard? There are functional elements such as PURE functions and the FORALL construct.

link

ataspinar 2983 days ago

"Fortran is the only language that has an International Standards Body that sees that sees scientific programmers (and by extensions HPC programmers) as their target audience." https://www.youtube.com/watch?v=4Gp5YJfinOA

link

Sean1708 2983 days ago

I've always felt that it was a bit of a shame that Fortran is falling by the wayside nowadays, a lot of universities are teaching C++ (or even C) instead now and it just isn't as pleasant to use for scientific computing work.

link

knlji 2983 days ago

Maybe. It's currently just a safer bet to learn and use Python. Easier to get a job after you fail getting your next grant. I have so far seen zero Julia job ads. Hell, I see more e.g. Haskell and Fortran job ads than Julia.

link

montalbano 2983 days ago

or learn both and have the convenience+speed of Julia for scientific work? Speaking from my experience here as a bioengineering PhD student. The Julia learning curve is low enough for an experienced Python/scipy user to switch over fairly swiftly.

edit: also, I would be very surprised if you had seen any Julia job ads, v1.0 hasn't been released yet. Doesn't mean it can't make my scientific life easier in the meanwhile.

link

overkalix 2983 days ago

I don't use Julia and I only use Python occasionally, but I'd rather go for Cython than Julia...

link

ChrisRackauckas 2976 days ago

Julia gives you such a competitive advantage that you can easily position yourself into a great career. It worked out really well for me. Of course YMMV, but having a much greater productivity and software quality never hurts.

link

montalbano 2983 days ago

Though I'll still use Python for non-scientific programming, I've switched to Julia for all my scientific programming needs in my day job.

link

evrydayhustling 2983 days ago

In research programming, you often spend as much or more time in data acquisition and munging than implementing core algorithms. Plus, more than in production code, the requirements change as you explore different applications and approach. And, because it's not production code, you have more opportunity to explore outputs at different stages to review function. It's effectively continual prototyping.

All of these things play to python's main strengths: huge community with connectors to every API and format, plus ability to conveniently integrate code at several levels of complexity & maturity as you prototype.

link

kwertzzz 2983 days ago

I am giving a class on numerical methods and one student choose to use python (while my example code was in Julia). It was a pain to see how this student was constantly shooting himself in the foot due some particular behaviors of python and numpy. For instance the student did not expect that a list comprehension iterating over a numpy vector returns just a python vector. Also the fact that the index ranges the last value is excluded let to several bugs. The exercise involved a time dependent matrix and the student choosed to represent it as a 3d Array, but then he needed to constantly convert slices as a matrix to use matrix multiplication (maybe this is now better solved with python 3 and the @ operator). So in short for doing linear algebra, Julia is really more convenient to use.

link

targafarian 2981 days ago

This comment just sounds like "I know tool X well, somebody else doesn't know tool Y well, and therefore tool X is better."

To do matrix multiplication on many matrices "stacked" together in one step, use numpy.matmul: https://docs.scipy.org/doc/numpy/reference/generated/numpy.m... (and so there's no need to slice up the array, convert to matrix, etc.)

Note that the Numpy devs are trying to (if they haven't already) get rid of the "matrix" class and just use arrays, but of course dealing with legacy code is always an issue. Once that's out of the way, people won't be distracted by "matrices" to do matrix operations, and hopefully they'll see you can do matrix operations on arrays directly. (And yes, in Python 3 you can use the @ syntax to the same effect.)

link

kwertzzz 2981 days ago

Thank you for the info about numpy.matmul. However this feels quite clumsy to use a function call for a matrix operator, especially when you have several chained. I think that this is an good move to get rid of matrix class (I just didn't find any depreciation info in the documentation, maybe this still needs some time).

I think the python language is very elegant and the principle that there should be only one obvious way to do something has served them quite well. Unfortunately, in numpy you have quite often multiple ways to do things (often in addition to python own mechanism, e.g sum, numpy.sum and the sum method). I deal with students who have little programming experience and this can be confusing. One of the reasons I choose Julia for my lectures was that these issues do not exist in Julia. Julia is quite clean and simple in this regard.

However I completely see that for an experienced programmer (or a scientist with good programming experience), this is not a problem.

But for somebody learning to solve numerical problems, it is quite helpful that Julia code tends to be closer to the mathematical formulation. In addition, for the test I made, Julia code tends to be faster than vectorized numpy code (I can share the code if there is interest). The only major argument against Julia, in my opinion, is that it is still a young language and with a small ecosystem (much smaller in fact than python or R)

link

targafarian 2980 days ago

I would be curious about the code you use. Numpy was natural for me after going through engineering school, where Matlab was taught from year 2 on. Again, that was a language much more focused on the numerics. But as soon as I had to do something that wasn't numerical (first job out of school, and for everything since), I learned to hate Matlab and love Python.

Anyhow, that experience surely doesn't map onto Julia, a completely different language. So I'd be curious to see what your use case is; it might give me a different perspective on Julia (which I have only played with a couple of times back when it was even younger).

link

ChrisRackauckas 2983 days ago

Those are strengths that are shared with Julia, which is exactly why the question needs to be asked.

link

hprotagonist 2983 days ago

just as soon as someone who knows C/C++ ports numpy and scipy and pandas. and gensim and nltk and sounddevice. and tensorflow and scikit-learn and keras....

link

T-A 2983 days ago

https://github.com/JuliaPy/PyCall.jl

link

ssivark 2983 days ago

Julia libraries don’t need to be written in C/C++ for speed; Julia is good enough :-) This was one of Julia’s fundamental goals.

link

ChrisRackauckas 2983 days ago

That was done quite awhile ago? Julia now has a bunch of unique stuff Python doesn't have because the basics are already done.

link

joshuamorton 2983 days ago

Such as?

link

ChrisRackauckas 2983 days ago

The iterative linear solvers from IterativeSolvers.jl along with the preconditioner ecosystem is more expansive and uses genericness to have a lot more functionality (it's all able to be used with matrix-free operators, GPUs and Xeon Phis, arbitrary precision number choices along with complex, quaternions, etc.). The differential equations solvers from DifferentialEquations.jl covers a lot more domains than the stuff you'll find in SciPy+PyDSTool (SDEs, DAEs, DAEs, semi-linear ODEs via exponential integrators, IMEX, etc.). The dynamical systems library DynamicalSystems.jl is one of a kind. QuantumOptics.jl is not only faster than QuTIP but it also covers more areas like stochastic Schrodinger. And JuMP for mathematical programming (optimization) is also very good in comparison to Pyomo. Scientific computing's core is linear algebra, optimization, and diffeqs and right there you have the basics plus some widespread applications.

I agree that Python has a library advantage in data science + ML. R has a library advantage in the area of statistics. But Julia has quite a few advantages in the core math areas of scientific computing and algorithm development. There is headway being made into DS+ML as well. Julia's pandas/dataframe equivalent is JuliaDB which adds out-of-core and online stats functionality, so it's more at the level of pandas+dask. Flux.jl is still in its early stages but it's quite a unique ML framework which can directly incorporate any Julia function at any level, and then has some working experiments with compiling to things like JS and XLA.

But in the broad view of things, every language has SciPy+NumPy pretty satisfactory (ex: Julia's Base library has most of it, the top 20 packages cover the rest), but from there all have tradeoffs in what areas the community is specializing in.

link

joshuamorton 2983 days ago

Cool! To be clear, I realize that could have come across as an accusatory "prove it!", it wasn't meant that way. I've just never really needed anything not available in the scipy ecosystem, except for some obscure statistical methods (or not that obscure, but with a non-terrible api) that I don't think are available in julia either.

link

gnufx 2982 days ago

Why do you need numpy if you have linear algebra built in?

link

Derbasti 2982 days ago

I keep trying out Julia every year or so. And it has come a long way. Gone are the days of crashes, missing documentation, and terrible error messages.

And yet, for my particular area (audio signal processing), Julia is just objectively worse than Python in expressivity, library support, and even speed.

But I'll keep trying. Maybe next year.

link

ocschwar 2983 days ago

Julia is FOrtran-indexed, and thus anathema to my religion.

link

sampo 2983 days ago

In Fortran you can choose the starting index for each array separately. Even negative integers work.

(But it's true that idiomatic Fortran starts indexing from 1.)

link

ChrisRackauckas 2983 days ago

And the same is true in Julia.

link

rfeather 2983 days ago

Could you elaborate more on the advantages of a functional language?

link

cbcoutinho 2983 days ago

Scientific work should strive to be functional by definition (identical input == identical output), so I could see why a programing language should reflect that.

link

ianamartin 2983 days ago

That's a pretty broad statement. And it's a goal that a lot of scientific work simply doesn't allow for.

Any stochastic process will have variability for a given set of inputs. I think there are more fields within the umbrella of Science where purely functional programming isn't an achievable ideal--let alone a desirable one--than there are where this is a good fit.

You can make the argument that all programming should be functional on its own if you want, and I'm open to that. But this it's pretty sketchy to me to just shoehorn all scientific work into a category of work that should definitionally be functional.

There are huge numbers of scientists who do not agree with that at all.

link

dagw 2983 days ago

Any stochastic process will have variability for a given set of inputs

If you're modelling stochastic processes it's important to be able to set the random seed you can reproduce your simulations. So for a given set of inputs you should get the same output, given that one of the inputs is your seed.

link

dr_zoidberg 2983 days ago

Until you change how you consume randomness, and the order isn't the same anymore.

link

gnufx 2982 days ago

What are these fields in which you can't use functional programming and it isn't desirable? (It's true there should be more engineering done on implementations of scientific stuff.) I've seen how unnatural imperative programming was to physicists who hadn't been exposed to it.

link

gnufx 2982 days ago

While I agree on a functional outlook, you're going to have a hard time generally making efficient, deterministic parallel programs, particularly distributed ones.

link

moolcool 2983 days ago

Python has good support for a lot of functional concepts

link

UncleEntity 2983 days ago

Except tail-call optimization which the BDFL refuses to consider since it would break "there should be one -- and preferably only one -- obvious way to do it" regarding loops.

link

eigenspace 2983 days ago

Julia has much better support for functional programming.

link

xapata 2983 days ago

You want multi-line lambdas and macros? Anything else?

link

KronenR 2983 days ago

Give me an ecosystem like Python has and I won't switch anyways because for when that happens I will have more experience with Python, which is much more important.

link

HerrMonnezza 2983 days ago

Does anyone know how this compares to existing Python-to-C++ transpilers like Cython or Shedskin?

link

Arkanosis 2983 days ago

Cython is a bit different from CPython / Pythran / Shedskin in that you need to learn the Cython language, which is a Python-ish programming language, but not Python.

Shedskin and Pythran look somewhat similar to me (disclaimer: I've contributed quite a bit to Shedskin but have never used Pythran so far), except in Shedskin you don't even need annotations like with Pythran (the downside being the finer control you have of the native types used is through the transpiler options). Also, Shedskin development is not much active these days — to say the least — and there's zero support for Python 3, while Pythran is under fairly active development has beta support for Python 3.

If you're interested in Python / native implementations, you might be interested by Nuitka as well: http://nuitka.net/

link

HerrMonnezza 2983 days ago

Thanks for your answer!

However, I think this might be a bit misleading for people who do not know Cython:

> the Cython language, which is a Python-ish programming language, but not Python.

Actually, http://cython.org/ states:

""" The Cython language is a superset of the Python language that additionally supports calling C functions and declaring C types on variables and class attributes. """

In my experience with Cython, this description is quite accurate: code can be annotated with C types and then compiled to efficient C code by Cython; if you don't use annotations, then you can still compile to C code but with less speed advantage.

I haven't used Cython since version 0.17 (quite old now) but IIRC the major drawback was that it was mainly targeting writing extension modules for Python; it could generate self-standing executables, but would still require a Python interpreter to be embedded in any compiled code (that was the price for seamless interoperability between Cython/compiled code and "pure Python" code).

link

xapata 2983 days ago

> Python-ish

It's a little more Python-like than just -ish.

link