Hacker News new | ask | show | jobs
by codewardenh 3147 days ago
Working with these libraries and the ecosystem has been great. We tried Julia for a while but found it sorely lacking whereas python has such a great community and history and wealth of experience and wisdom to share.
2 comments

I have had the opposite experience since most decent Python packages seem to have their own implementation of a parser, JIT, etc. in order to have any speed, making them monolithic monsters that are hard to contribute to and are hard to modify. Python's ecosystem seems to handle the basic cases very well, but when I wanted to "go to research land" in pretty much any scientific computing subject (mathematical optimization, numerical linear algebra, or differential equations), I quickly hit a wall that would require I write one of these monoliths myself in C++. I haven't hit any walls in Julia, but then again if you stay in the basic standard equations/models YMMV.

But SciPy is a great project and everyone can learn something from their successes.

I don't really get your point about monoliths in this specific usecase. Let's say you're exploring with NumPy. What stops you from

(a) writing your research as a bunch of Fortran or C kernels and intrgrate them with automatic bindings such as f2py? Especially Fortran is a great fit for NumPy datastructures - because they are the same.

(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?

>(a) writing your research as a bunch of Fortran or C kernels and intrgrate them with automatic bindings such as f2py? Especially Fortran is a great fit for NumPy datastructures - because they are the same.

If I have to write everything important and difficult in Fortran or C kernels, then I'm losing the productivity of the higher level language. And in this case it would be writing kernels from scratch instead of tweaking high level package code if I want to change data structures and implement new algorithms for existing problems. This makes it hard to directly re-purpose the internal linesearch code from the SciPy optimizers without Python in-between into my own Fortran optimizer, or add new projections and stuff to existing SciPy implementations and have that be a separately maintained package but dependent on SciPy.

Even just talking about this starts to sound like a mess from the past, but fortunately this problem is already solved by Julia so there's no reason to have to deal with it.

>(b) Use high performance python environments like Numba, NumbaPro (GPU) or even Cython?

Well the biggest thing against Numba is once again you have to write everything from scratch if you want to do things non-trivial. They finally have a way for you to extend Numba to add a class, but you don't by default get the whole Python standard library compatible with it, and classes defined in user's packages aren't automatically compatible with it. So you have to start re-writing basic data structures and other types to get the full algorithm JIT'd. And Python's standard library isn't all setup with @generated_jit to then dispatch everything effectively. So you end up having to build tons of things on your own because of the lack of library support, and it adds development time. Then there are restrictions on allocations, no easy way to do distributed parallelism, take in functions to my functions to inline them (this is important for writing things like ODE solvers, but with Numba you have to do a hack to make this almost work but it won't optimize this well), etc. Essentially it hits a feature wall when it gets past microbenchmarks and to real programming. This all hearkens back to your first comment:

>Let's say you're exploring with NumPy

Exactly: if you're content with building everything off of arrays yourself then Numba will do fine. But its limitations significantly impact the ability to compose full packages in native Python with constant propagation, interprocedural optimization, etc.: all the stuff you you depend on static languages doing in order to optimize large-scale software. On top of that, it's a difficult dependency to install. So in the end, it's great for JITing little blurbs, but I quickly hit the edge when trying to fight it.

Cython is similar to Numba, where you can setup your own fused types and extension classes, re-writing the standard library, etc., but now you've taken to writing essentially the stdlib for a language (or at least the parts you need) just to be allowed to use this "easy accelerator" for a demanding project.

If it was 10 years ago these issues would be okay, but these days it stands out as a productivity and performance barrier. Julia has already solved this problem, and hell even LuaJIT is easier to build this kind of software with. Or if you're going to fuss around with compiled code, we can use D or Rust, or even go all the way back to C++ and Fortran. In fact, it's quite telling that even fairly recent big "Python" projects like TensorFlow, PyTorch, FEniCS, etc. all built their whole package in C++ and made Python bindings instead of trying to use these accelerators. Meanwhile people are punching out large Julia projects are using pure Julia without hesitation.

To an end user of packages it really doesn't matter how it was implemented, but when your work and research is in developing these kinds of large software projects, these limitations make a huge difference.

Thank you for the expansive answer - I can really see where you're from now and I agree with your issues in Python. It's interesting to me that you go to Julia as a solution (I mean that honestly, I want to play with it when an exploration phase comes up).
What was Julia lacking? I know and use R, but have been considering Julia.
I'm pretty sure GP was complaining about the lack of libraries and the breaking changes between verions. Julia is much younger, has a smaller community and still in active development as a language, so it is very likely that you have to build and maintain your own code.

With SciPy, the community is much bigger and present in every imaginable scientific field. Whenever someone comes up with a new algorithm, there's a python implementation before you know it. I'm not going to pretend that this isn't a huge added value, but I find the way GP phrased themselves a little bit annoying, as it implies the language Julia is is the issue. It is honestly a more beautiful and more efficient language to program in.

(also, these days you can call out to Python and R without significant overhead, so you can use any package out there; Julia has a pretty good interop story)

https://github.com/JuliaPy

https://github.com/JuliaInterop