|
|
|
|
|
by andreasvc
4320 days ago
|
|
If you subscribe to the idea of a division of labor between systems and scripting languages, then Python + its C based extensions (Cython, Numpy, etc.) is ideal, as it already has an extensive scientific ecosystem. Julia combines the two in a single language. See this blog post and the one that follows it: http://graydon2.dreamwidth.org/3186.html |
|
Yes numpy, scipy indeed dispatch to precompiled C and sometimes Fortran loops but the problem lies elsewhere, in its vectorization paradigm. It is just extremely wasteful. There are two problems:
(a) it is not expressive enough to capture efficient computation without generating unnecessary intermediate arrays whose sole objective is to make it possible to write the computation as a vectorized operation. Unlike Matlab in the past, numpy, scipy are at least smart about broadcasting. This often allows one to avoid constructing those intermediates in memory. However, this comes at an extra indirection that affects all array operations via the stride vector. You pay the cost of indirection whether you need it or not.
(b) The second problem is generation of temporaries when you chain several binary operations. These temporaries get allocated, filled and destroyed over and over again within a single expression which itself might be in a loop. This costs computation, memory, not to mention GC pressure. There is of course numexpr but it is also quite limiting. For instance you cannot index or slice from within a numexpr expression. It offers limited set of reduce operations, in an expression you can use only one, and it must be the last one in the sequence of operations.
Then there is 'write to C'. If we have not eliminated the need to write C we have not really solved the hard problems have we. I think the whole point was to avoid writing low level code because it is error prone, tedious and that it often comes at the cost of productivity. The drop down to C imposes an unnecessary break in flow and forces you to tackle the impedance mismatch. Tools like Cython eases this a bit. You cannot for example use numpy array expressions efficiently from Cython, you have to write those tedious low level indexing code. If I were to write C, I would rather write it in C syntax and take advantage of the decades of tooling around C syntax. Cython is great and an awesome community effort, but it still quite a simple compiler and has limitations.
So far I have been talking only about ease of use, quality of programming experience etc etc, but that is not the only issue here. The problem is calls to C and more importantly callbacks from C to Python are expensive enough to be non-ignorable. If you have a hot loop where you go back and forth between C/Fortran world and Python thats going to incur a serious hit. The solution is to make the containing piece of Code into C/Fortran/Cython, so it ends up swallowing more and more of the application logic, leaving only but a shell of I/O in the Python world.
Its not the end of the world but not quite the rosy picture you give. Another issue is cultural, its common among many newer programmers not to have really experienced fast runtimes, of course this is a generalization and does not apply to all, but have seen it happen frequently enough. They are greatly amazed by what I would call only modest improvement in runtimes and they would be cheering "Wow! so speed much fast!" etc.
All of these make me be really hopeful about Julia. Interacting with the community gives me the feeling that they get it. Julia is an expressive language, already quite performant and not saddled by limitations of vectorization. I do like the terseness of vectorized expressions over loops, this is being filled by devectorize.jl. Yes there are more libraries available in R or Scipy but given the ease with which one can code in Julia I dont see this to be an unsurmountable problem. Every language has to begin somewhere and unlike say other competing solutions like Torch7 I find the community very friendly, responsive and pragmatic. It seems they spend conscious effort to keep it that way. So, Julia community, here is wishing all the best.
I do love Python a lot, and I mean really really a whole lot (except for its OOP parts) but this self cheering gets a little too much at times.