Hacker News new | ask | show | jobs
by srean 4320 days ago
It is light years far from ideal. There are two variants of this much parroted line (i) Numpy, Scipy are just C loops, and (ii) 'just do the intensive parts in C', both leave a lot more to be desired.

Yes numpy, scipy indeed dispatch to precompiled C and sometimes Fortran loops but the problem lies elsewhere, in its vectorization paradigm. It is just extremely wasteful. There are two problems:

(a) it is not expressive enough to capture efficient computation without generating unnecessary intermediate arrays whose sole objective is to make it possible to write the computation as a vectorized operation. Unlike Matlab in the past, numpy, scipy are at least smart about broadcasting. This often allows one to avoid constructing those intermediates in memory. However, this comes at an extra indirection that affects all array operations via the stride vector. You pay the cost of indirection whether you need it or not.

(b) The second problem is generation of temporaries when you chain several binary operations. These temporaries get allocated, filled and destroyed over and over again within a single expression which itself might be in a loop. This costs computation, memory, not to mention GC pressure. There is of course numexpr but it is also quite limiting. For instance you cannot index or slice from within a numexpr expression. It offers limited set of reduce operations, in an expression you can use only one, and it must be the last one in the sequence of operations.

Then there is 'write to C'. If we have not eliminated the need to write C we have not really solved the hard problems have we. I think the whole point was to avoid writing low level code because it is error prone, tedious and that it often comes at the cost of productivity. The drop down to C imposes an unnecessary break in flow and forces you to tackle the impedance mismatch. Tools like Cython eases this a bit. You cannot for example use numpy array expressions efficiently from Cython, you have to write those tedious low level indexing code. If I were to write C, I would rather write it in C syntax and take advantage of the decades of tooling around C syntax. Cython is great and an awesome community effort, but it still quite a simple compiler and has limitations.

So far I have been talking only about ease of use, quality of programming experience etc etc, but that is not the only issue here. The problem is calls to C and more importantly callbacks from C to Python are expensive enough to be non-ignorable. If you have a hot loop where you go back and forth between C/Fortran world and Python thats going to incur a serious hit. The solution is to make the containing piece of Code into C/Fortran/Cython, so it ends up swallowing more and more of the application logic, leaving only but a shell of I/O in the Python world.

Its not the end of the world but not quite the rosy picture you give. Another issue is cultural, its common among many newer programmers not to have really experienced fast runtimes, of course this is a generalization and does not apply to all, but have seen it happen frequently enough. They are greatly amazed by what I would call only modest improvement in runtimes and they would be cheering "Wow! so speed much fast!" etc.

All of these make me be really hopeful about Julia. Interacting with the community gives me the feeling that they get it. Julia is an expressive language, already quite performant and not saddled by limitations of vectorization. I do like the terseness of vectorized expressions over loops, this is being filled by devectorize.jl. Yes there are more libraries available in R or Scipy but given the ease with which one can code in Julia I dont see this to be an unsurmountable problem. Every language has to begin somewhere and unlike say other competing solutions like Torch7 I find the community very friendly, responsive and pragmatic. It seems they spend conscious effort to keep it that way. So, Julia community, here is wishing all the best.

I do love Python a lot, and I mean really really a whole lot (except for its OOP parts) but this self cheering gets a little too much at times.

2 comments

I have used CFFI in a few projects for writing a core computation in actual, pure C, and dynamically calling it from Python. Performance is really great, I vastly prefer it over Cython.

But, it doesn't solve the vectorization problem. Often times, you have to transform simple iterative algorithms to a convoluted vectorized mess if you want to have decent performance. Not every procedure is easily expressed as matrix multiplication.

This is a very real cognitive tax of the vectorized approach. Thank you for your writeup!

But why do you prefer it over Cython, since it can also wrap "actual, pure C, and dynamically [call] it from Python"?

What is the vectorization problem? Both Cython and Julia allow you to write tight loops and have them perform well.

In CFFI, you write either Python, or C. In Cython, you additionally write a weird dialect that is neither C nor Python. I prefer straight C over Cython.
Your points (a) and (b) are specific to numerical code; perhaps Numba addresses these problems well but it's not something I've worked with.

With regards to C and Cython, I wouldn't say "the whole point was to avoid writing low level code", because that is what can give you the most performance in the end. If you do want to avoid low level code then you'll prefer to approach with a single language such as Julia or Java, but this means trading the bare-to-the-metal performance for having a VM and JIT between the code and the machine.

The argument of preferring writing C code to Cython code is moot because you have the choice of writing code in Cython or using it to wrap existing C code. While the second option allows you to write in C as you say you'd prefer, the first options offers seamless integration of C code and manipulation of Python datastructures, so there is an advantage to using Cython not just for wrappers but also for code.

I'm a bit at a loss as to why you would claim I gave a rosy picture, or what this "self cheering" is about (note that my use of "ideal" was part of a conditional). I actually presented it as a dilemma (systems & scripting language, or a single language), and I think it's not clear which approach is better or whether the fact that there seems to be a dilemma is accidental or necessary.

The 'cheering' part was not directed at you, I should have made that clearer and indeed my comments are specific to numeric code. You mention Java, it so happened that I commented why Java is not a good solution for numerics just yesterday https://news.ycombinator.com/item?id=8214922 However, if you follow that thread you would see that it is not clear yet whether writing C is what would give you the most performant and correct code.

> The argument of preferring writing C code to Cython code is moot because you have the choice of writing code in Cython or using it to wrap existing C code.

I am not convinced about this and I have mentioned why I am not so enamored by this approach. Although quite a feat in itself Cython is not a very sophisticated compiler, so if you are writing C code you are better of taking charge and writing the C yourself. You get to enjoy the tooling that have accumulated over the years around the C syntax language. Otherwise you often get yourself in a solution that you have to debug the autogenerated C, not very pleasant. The second reason is that the bridge between Cython and C is by no means cheap. Note Cython's objective is to produce a Python C modules, not native C binaries, so it will talk to see with all the disadvantages of talking to C from a Python C module. Cython is indeed great if you have legacy code in Python that you would want to marry to C, or to get some speedup with minimal intervention, but if you are not so constrained and want more speedup than this affords perhaps better approaches are needed.

I would also point out that Julia does not always yield faster solutions than Cython yet. My main motivation was to point out some design issues that you are saddled with when you are a denizen in the Python world.

Indeed, Cython is not a very sophisticated compiler (although it does benefit from existing optimized C compilers), but the same argument goes for Julia. That's a matter of implementation for which there is room for improvement; the interesting discussion is which is the more appropriate architecture.

As for debugging, I admit, it can get ugly with Cython, but in my experience this only happens when you decide to do low level manual memory management. This ability to shoot yourself in the foot is part of the trade-off of close-to-the-metal performance. It's not pretty but then again descending to that level is entirely optional.

You say that the bridge between C and Cython is not cheap, but this characterization is mistaken. It is easy to write Cython code that maps 1-to-1 to C code without using any part of Python whatsoever. What is expensive is bridging back to Python; e.g., calling a Python function requires constructing a tuple and all Python objects are heap allocated. However, you can choose to use this bridge as little as you want (the extreme case is only using Python to call a main function defined in Cython).

You argue that Cython only produces modules but not native binaries. In fact this is not true, it can produce such binaries but that implies including a Python interpreter as part of the binary (--embed option).

> and want more speedup than this affords perhaps better approaches are needed.

What are you alluding to here? Cython offers the speed of pure C/Fortran. I can think of 2 limitations: calling back and forth from C code to Python code is expensive (but then don't do that frequently, if it's a tight loop it's worth optimizing), and JIT optimizations.

> the interesting discussion is which is the more appropriate architecture.

I am quite convinced that between the two, Julia's is the better way. I dont think you will be convinced so I will leave this thread with this last comment.

With Julia's JIT, macros and multiple dispatch and type specialization there is a whole world of things that you can do in Julia _now_ that you cannot do in the Cython/Python split world. Another advantage that Julia shares is that it is not saddled with Python in a way that Cython is. You might consider this an unfair advantage though.

> As for debugging...

I think you are coming from a position that allows you to brush such issues aside with "low level is hard, so suck it up. That experience is going to be bad anyway"

I disagree. First, with Julia I probably wont need to drop down to that level as often. Secondly, Cython takes away one major redeeming quality of Python in the numeric context: Numpy array syntax. I cannot use that any more in any performant sense because that would callback into Numpy API. So now I have to writethat indexing code in low level C in Cython syntax. Thirdly, if you give me the full power of C or C++ I can manage to get low level with less complexity than the Python / Cython split world and with less things saddled on me.

Why do you think that it is better to talk to C through those limitations ?

If I do, I would be writing C in a Python syntax that supports some fuzzy subset of C and some fuzzy subset of Python, which will then get compiled by a simplistic compiler to produce quite a sizable C code which I would then compile with a C compiler to get a module with questionable debugging support. Compared to Julia this looks clearly worse to me, you may feel otherwise.

If I really need to write C I would prefer to have full C at my disposal without multi language split braining. I would like to speak to the C compiler without an indirection though another compiler.

I stand corrected about Cython's abilities to produce a binary, but dont find the argument "oh by the way it will come with the Python interpreter" unless I really do want to embed a Python interpreter. Dont get me wrong, Cython is awesome if you want to integrate C with Python, I have already said this before. Its great if you have legacy Python code, or co-workers who are unfamiliar or unwilling to work with C. But when you are free of such constraints, Cython only saddles you with more.

As for speed of the Cython_module <---> Python bridge, I think we disagree about what is fast. Take a concrete example of gradient decent code. One way to do it is to have the gradient decent hot loop as a Cython function that takes two Python callbacks the function that you are trying to minimize, and another to compute the gradient (after all the numpy syntax is nice for such things). If you do this the speed is going to be abominable. The next option is to have the loop in Python but have the objective function and the gradient function as Cython. Even if you manage to bind these two names in the closest scope possible, Python will repeatedly lookup the names again and again before calling them from a dynamic interface, and its a Python loop, not known for speed. Furthermore in that Cython implemented function I have lost the pleasant syntax of Numpy. Furthermore, this bridge is a compiler optimization barrier. So the really viable option is to convert the containing loop in Cython, then after that what remains ? If this is what is required I would have just written this whole thing in plain C, or C++ and have had the full language and tooling at my finger tips. What additional advantage is Cython giving me here ? It is not without advantages, one I have already mentioned, integration with Python code, another is prototyping. With Julia the latter is taken care of, and the former is covered somewhat. Although If I have strong need to play well with Python _now_ I would choose Cython over Julia.

The example was by no means hypothetical, have done this and code speed improved by an order of magnitude when I redid it entirely in C++. Doing away with the lookups by itself sped it up and when I coaxed the compiler to optimize across the boundary, in particular inline the functions that is what gave an order of magnitude improvement. Julia's design permits such things to happen without the need for a split-brain problem.