I'm curious. I would like to know if it is worth to learn a new language only for speed reasons or if it's better to hope in improvements in Python... What do you think?
I wouldn't expect mainline Python to get significantly faster. You can get huge improvement in speed by using various libraries, alternative interpreters, and other strategies - but you'll always be going beyond "pure", standard Python.
Julia is designed with numerical performance in mind, and gets impressive results out of the box. But, because of the aforementioned, that by itself would probably not be a compelling reason to learn a new language. Such reasons would be Julia's design, which may make it preferable to you once you've learned more about it. Read up on multiple dispatch, built-in networked computing, and array syntax. I find the design around multiple dispatch to be much more sensible than Python's object-oriented approach.
>I wouldn't expect mainline Python to get significantly faster.
Why not? JS did. And before you mention that it's the de facto language of the web, so of course had tons of commercial backing, well, PHP did too (with PHP7, but also with recent 3 releases really working on speed and less memory use).
We have this Dropbox initiative too, and renewed talk about Python optimization.
For Python it's an obvious shortcoming, with some low-hanging fruits available.
The tricky thing about Python is that it's a classic two-language system how performance critical extensions to the language are all implemented. That was never the case with JavaScript since there was no way to extend it in C. So the V8 team was free to do anything they wanted to make JS fast so long as it still behaved like JS. Alex Rubinsteyn explained the problem far better than I will manage to in this blog post:
The conclusion is that if you do all the things you can to make Python faster without breaking compatibility with CPython (thereby losing all of SciPy):
> "In the best cases, such as when lots of integers are being manipulated in a loop, [you] might get up to 3X faster than the regular Python interpreter. More often, the gains hover around a meager 20%."
One way might be to make the regular C extension API have a performance drop (e.g use it through some translation layer facade) and introduce a new C extension API alongside it that plays well with the necessary changes in the language.
Honestly, I just don't think this is going to happen in Python, but I could very well be wrong. The biggest impediment had been Guido van Rossum's (IMO understandable) reluctance to accept performance improvements that increase the complexity of the standard implementation. The fact that he works at Dropbox, which is also the home of the Pyston project gives some hope of him being more supportive of this kind thing in the future, but that's pretty thin.
Julia's a nice language even without the speed improvements. It's less overtly object oriented than Python, which could be a pro or con for you. I like its multiple dispatch, for example.
It is interesting. I held off on learning Python because Matlab was there (paid by school/employer), it ran reasonably fast, and all the libraries (toolboxes) were there.
Now I'm trying to jump to Julia and many of my Python friends are making the exact same arguments (you'll never get the community/libraries/support) that used to made against Python in favor of Matlab.
Anyway, as a long time Matlab user, Julia just feels incredibly 'natural' to me.
If you subscribe to the idea of a division of labor between systems and scripting languages, then Python + its C based extensions (Cython, Numpy, etc.) is ideal, as it already has an extensive scientific ecosystem. Julia combines the two in a single language. See this blog post and the one that follows it: http://graydon2.dreamwidth.org/3186.html
It is light years far from ideal. There are two variants of this much parroted line (i) Numpy, Scipy are just C loops, and (ii) 'just do the intensive parts in C', both leave a lot more to be desired.
Yes numpy, scipy indeed dispatch to precompiled C and sometimes Fortran loops but the problem lies elsewhere, in its vectorization paradigm. It is just extremely wasteful. There are two problems:
(a) it is not expressive enough to capture efficient computation without generating unnecessary intermediate arrays whose sole objective is to make it possible to write the computation as a vectorized operation. Unlike Matlab in the past, numpy, scipy are at least smart about broadcasting. This often allows one to avoid constructing those intermediates in memory. However, this comes at an extra indirection that affects all array operations via the stride vector. You pay the cost of indirection whether you need it or not.
(b) The second problem is generation of temporaries when you chain several binary operations. These temporaries get allocated, filled and destroyed over and over again within a single expression which itself might be in a loop. This costs computation, memory, not to mention GC pressure. There is of course numexpr but it is also quite limiting. For instance you cannot index or slice from within a numexpr expression. It offers limited set of reduce operations, in an expression you can use only one, and it must be the last one in the sequence of operations.
Then there is 'write to C'. If we have not eliminated the need to write C we have not really solved the hard problems have we. I think the whole point was to avoid writing low level code because it is error prone, tedious and that it often comes at the cost of productivity. The drop down to C imposes an unnecessary break in flow and forces you to tackle the impedance mismatch. Tools like Cython eases this a bit. You cannot for example use numpy array expressions efficiently from Cython, you have to write those tedious low level indexing code. If I were to write C, I would rather write it in C syntax and take advantage of the decades of tooling around C syntax. Cython is great and an awesome community effort, but it still quite a simple compiler and has limitations.
So far I have been talking only about ease of use, quality of programming experience etc etc, but that is not the only issue here. The problem is calls to C and more importantly callbacks from C to Python are expensive enough to be non-ignorable. If you have a hot loop where you go back and forth between C/Fortran world and Python thats going to incur a serious hit. The solution is to make the containing piece of Code into C/Fortran/Cython, so it ends up swallowing more and more of the application logic, leaving only but a shell of I/O in the Python world.
Its not the end of the world but not quite the rosy picture you give. Another issue is cultural, its common among many newer programmers not to have really experienced fast runtimes, of course this is a generalization and does not apply to all, but have seen it happen frequently enough. They are greatly amazed by what I would call only modest improvement in runtimes and they would be cheering "Wow! so speed much fast!" etc.
All of these make me be really hopeful about Julia. Interacting with the community gives me the feeling that they get it. Julia is an expressive language, already quite performant and not saddled by limitations of vectorization. I do like the terseness of vectorized expressions over loops, this is being filled by devectorize.jl. Yes there are more libraries available in R or Scipy but given the ease with which one can code in Julia I dont see this to be an unsurmountable problem. Every language has to begin somewhere and unlike say other competing solutions like Torch7 I find the community very friendly, responsive and pragmatic. It seems they spend conscious effort to keep it that way. So, Julia community, here is wishing all the best.
I do love Python a lot, and I mean really really a whole lot (except for its OOP parts) but this self cheering gets a little too much at times.
I have used CFFI in a few projects for writing a core computation in actual, pure C, and dynamically calling it from Python. Performance is really great, I vastly prefer it over Cython.
But, it doesn't solve the vectorization problem. Often times, you have to transform simple iterative algorithms to a convoluted vectorized mess if you want to have decent performance. Not every procedure is easily expressed as matrix multiplication.
This is a very real cognitive tax of the vectorized approach. Thank you for your writeup!
In CFFI, you write either Python, or C. In Cython, you additionally write a weird dialect that is neither C nor Python. I prefer straight C over Cython.
Your points (a) and (b) are specific to numerical code; perhaps Numba addresses these problems well but it's not something I've worked with.
With regards to C and Cython, I wouldn't say "the whole point was to avoid writing low level code", because that is what can give you the most performance in the end. If you do want to avoid low level code then you'll prefer to approach with a single language such as Julia or Java, but this means trading the bare-to-the-metal performance for having a VM and JIT between the code and the machine.
The argument of preferring writing C code to Cython code is moot because you have the choice of writing code in Cython or using it to wrap existing C code. While the second option allows you to write in C as you say you'd prefer, the first options offers seamless integration of C code and manipulation of Python datastructures, so there is an advantage to using Cython not just for wrappers but also for code.
I'm a bit at a loss as to why you would claim I gave a rosy picture, or what this "self cheering" is about (note that my use of "ideal" was part of a conditional). I actually presented it as a dilemma (systems & scripting language, or a single language), and I think it's not clear which approach is better or whether the fact that there seems to be a dilemma is accidental or necessary.
The 'cheering' part was not directed at you, I should have made that clearer and indeed my comments are specific to numeric code. You mention Java, it so happened that I commented why Java is not a good solution for numerics just yesterday https://news.ycombinator.com/item?id=8214922 However, if you follow that thread you would see that it is not clear yet whether writing C is what would give you the most performant and correct code.
> The argument of preferring writing C code to Cython code is moot because you have the choice of writing code in Cython or using it to wrap existing C code.
I am not convinced about this and I have mentioned why I am not so enamored by this approach. Although quite a feat in itself Cython is not a very sophisticated compiler, so if you are writing C code you are better of taking charge and writing the C yourself. You get to enjoy the tooling that have accumulated over the years around the C syntax language. Otherwise you often get yourself in a solution that you have to debug the autogenerated C, not very pleasant. The second reason is that the bridge between Cython and C is by no means cheap. Note Cython's objective is to produce a Python C modules, not native C binaries, so it will talk to see with all the disadvantages of talking to C from a Python C module. Cython is indeed great if you have legacy code in Python that you would want to marry to C, or to get some speedup with minimal intervention, but if you are not so constrained and want more speedup than this affords perhaps better approaches are needed.
I would also point out that Julia does not always yield faster solutions than Cython yet. My main motivation was to point out some design issues that you are saddled with when you are a denizen in the Python world.
Indeed, Cython is not a very sophisticated compiler (although it does benefit from existing optimized C compilers), but the same argument goes for Julia. That's a matter of implementation for which there is room for improvement; the interesting discussion is which is the more appropriate architecture.
As for debugging, I admit, it can get ugly with Cython, but in my experience this only happens when you decide to do low level manual memory management. This ability to shoot yourself in the foot is part of the trade-off of close-to-the-metal performance. It's not pretty but then again descending to that level is entirely optional.
You say that the bridge between C and Cython is not cheap, but this characterization is mistaken. It is easy to write Cython code that maps 1-to-1 to C code without using any part of Python whatsoever. What is expensive is bridging back to Python; e.g., calling a Python function requires constructing a tuple and all Python objects are heap allocated. However, you can choose to use this bridge as little as you want (the extreme case is only using Python to call a main function defined in Cython).
You argue that Cython only produces modules but not native binaries. In fact this is not true, it can produce such binaries but that implies including a Python interpreter as part of the binary (--embed option).
> and want more speedup than this affords perhaps better approaches are needed.
What are you alluding to here? Cython offers the speed of pure C/Fortran. I can think of 2 limitations: calling back and forth from C code to Python code is expensive (but then don't do that frequently, if it's a tight loop it's worth optimizing), and JIT optimizations.
> the interesting discussion is which is the more appropriate architecture.
I am quite convinced that between the two, Julia's is the better way. I dont think you will be convinced so I will leave this thread with this last comment.
With Julia's JIT, macros and multiple dispatch and type specialization there is a whole world of things that you can do in Julia _now_ that you cannot do in the Cython/Python split world. Another advantage that Julia shares is that it is not saddled with Python in a way that Cython is. You might consider this an unfair advantage though.
> As for debugging...
I think you are coming from a position that allows you to brush such issues aside with "low level is hard, so suck it up. That experience is going to be bad anyway"
I disagree. First, with Julia I probably wont need to drop down to that level as often. Secondly, Cython takes away one major redeeming quality of Python in the numeric context: Numpy array syntax. I cannot use that any more in any performant sense because that would callback into Numpy API. So now I have to writethat indexing code in low level C in Cython syntax. Thirdly, if you give me the full power of C or C++ I can manage to get low level with less complexity than the Python / Cython split world and with less things saddled on me.
Why do you think that it is better to talk to C through those limitations ?
If I do, I would be writing C in a Python syntax that supports some fuzzy subset of C and some fuzzy subset of Python, which will then get compiled by a simplistic compiler to produce quite a sizable C code which I would then compile with a C compiler to get a module with questionable debugging support. Compared to Julia this looks clearly worse to me, you may feel otherwise.
If I really need to write C I would prefer to have full C at my disposal without multi language split braining. I would like to speak to the C compiler without an indirection though another compiler.
I stand corrected about Cython's abilities to produce a binary, but dont find the argument "oh by the way it will come with the Python interpreter" unless I really do want to embed a Python interpreter. Dont get me wrong, Cython is awesome if you want to integrate C with Python, I have already said this before. Its great if you have legacy Python code, or co-workers who are unfamiliar or unwilling to work with C. But when you are free of such constraints, Cython only saddles you with more.
As for speed of the Cython_module <---> Python bridge, I think we disagree about what is fast. Take a concrete example of gradient decent code. One way to do it is to have the gradient decent hot loop as a Cython function that takes two Python callbacks the function that you are trying to minimize, and another to compute the gradient (after all the numpy syntax is nice for such things). If you do this the speed is going to be abominable. The next option is to have the loop in Python but have the objective function and the gradient function as Cython. Even if you manage to bind these two names in the closest scope possible, Python will repeatedly lookup the names again and again before calling them from a dynamic interface, and its a Python loop, not known for speed. Furthermore in that Cython implemented function I have lost the pleasant syntax of Numpy. Furthermore, this bridge is a compiler optimization barrier. So the really viable option is to convert the containing loop in Cython, then after that what remains ? If this is what is required I would have just written this whole thing in plain C, or C++ and have had the full language and tooling at my finger tips. What additional advantage is Cython giving me here ? It is not without advantages, one I have already mentioned, integration with Python code, another is prototyping. With Julia the latter is taken care of, and the former is covered somewhat. Although If I have strong need to play well with Python _now_ I would choose Cython over Julia.
The example was by no means hypothetical, have done this and code speed improved by an order of magnitude when I redid it entirely in C++. Doing away with the lookups by itself sped it up and when I coaxed the compiler to optimize across the boundary, in particular inline the functions that is what gave an order of magnitude improvement. Julia's design permits such things to happen without the need for a split-brain problem.
That was a really nice read (it's a quick history of programming languages through a particular lens, with a glowing endorsement of Julia at the end). I couldn't find the link to the second post (which is the one about Julia), it's this:
Do one thing and do it well, seems like a good idea. Of course, do everything better than anyone else is even better, but is it really possible? I'm not sold on the idea of a one-size-fits-all programming language.
Well, if there was some theorical barrier, and if we were indeed talking of making the "100% perfect" language" yes.
But in real life things preventing using a language in more domains are mostly mere bad design, lack of some features and legacy baggage.
Case in point with Python. It's not that we want it to be faster or lower level than C to use in kernels.
We just want it to be 10x, 20x faster, something entirely possible (Javascript JITs prove it can be done for even the most dynamic of languages), and we just want it to be able to tap into multicore CPUs (something that would also be entirely possible if it was slightly better designed).
So it's not like there's some huge theoritical barrier to achieving this stuff. It's mostly money (like those spend by Apple, Google and Mozilla to their JIT JS engines), and the will to drop some backwards compatibility (which they did with 3.0 but for marginal benefits instead of addressing useful stuff).
It depends what you want. If you think of the ordinary CPython interpreter, that is unlikely to ever reach the speed of Julia, and I don't think that is even a priority for the Python developers. So if execution speed is very important for you, go for Julia.
On the other hand, it is not likely that Julia will even come close to have a similar ecosystem of libraries as you can find in Python in the foreseeable future. The exception is Numerics-heavy fields, which is Julias focus area, where there is a lot of good libraries.
thanks to all. I'm studying AI, so maybe there will be something good for Julia BUT we are reaching only now good libraries for Python.
They are porting good R libraries, there is Theano, pandas, scikit-learn, pylearn2 etc. With Julia? Let's see but it's a shame we start again :(((
Julia is designed with numerical performance in mind, and gets impressive results out of the box. But, because of the aforementioned, that by itself would probably not be a compelling reason to learn a new language. Such reasons would be Julia's design, which may make it preferable to you once you've learned more about it. Read up on multiple dispatch, built-in networked computing, and array syntax. I find the design around multiple dispatch to be much more sensible than Python's object-oriented approach.