Hacker News new | ask | show | jobs
by superdimwit 2535 days ago
I'd really recommend anyone doing mildly numerical / data-ey work in python to give Julia a patient and fair try.

I think the language is really solidly designed, and gives you ridiculously more power AND productivity than python for a whole range of workloads. There are of course issues, but even in the short time I've been following & using the language these are being rapidly addressed. In particular: generally less rich system of libraries (but some Julia libraries are state of the art across all languages, mainly due to easy metaprogramming and multiple dispatch) + generally slow compile times (but this is improving rapidly with caching etc). I would also note that you often don't really need as many "libraries" as you do in python or R, since you can typically just write down the code you want to write, rather than being forced to find a library that wraps a C/C++ implementation like in python/r.

2 comments

>you can typically just write down the code you want to write, rather than being forced to find a library that wraps a C/C++ implementation like in python/r.

I don't think this is really a feature. It's nice that you can write more performant code in Julia directly and don't need to wrap lower level languages, without question, but the lack of libraries or library features is not a good thing. It's always better to use a general purpose library that's been battle tested than to write your own numerical mathematics code (because bugs in numerical code can take a long time to get noticed)

For specialized scientific computing applications, which would normally be written in C/C++, I would absolutely look into using Julia instead (though not sure what the openmp/mpi support is like). But I would also recommend against rolling your own numerical software unless you need to

I don't just think it's a feature, I think it's a killer feature.

You are much less likely to reinvent the wheel if you can add your one critical niche feature / bugfix to an existing library. In python, learning C and C build systems and python's C API are gigantic barriers to doing that.

More importantly, if every fast data manipulation needs to be written in C, a few of them can be profitably shared, but you need more than a few of them. Pretty soon you wind up with a giant dumping ground of undiscoverable API bloat. See: pandas.

Maybe I don't understand what API bloat is in this context -- can you give some more detail regarding your thoughts on pandas?
Here's one of the fifteen API ref sections in pandas:

https://pandas.pydata.org/pandas-docs/stable/reference/serie...

Even though it's long, it undersells the problem, because many of these methods have nontrivial overload semantics that open up like a fractal when you look in turn at their docs. The link also undersells the problem because this junkheap is evidently so incomplete that people are frequently forced to rely on numpy to extend it.

APIs should make hard things easy, but API gloveboxes like this make easy things hard. Minimal API + Performant Glue >> We do everything for you + You can't ever touch your own data or your perf dies + Good luck reverse engineering these semantics if you've forgotten the context and need to port it.

Okay, I think I see your point. The different object methods you are seeing as API calls, and because they are granular and have capacity to do many common and uncommon tasks this is viewed as bloat. Makes sense from that perspective. Thanks.
While Python has good libraries in general computing, and it has good ML libraries, it's really lacking in scientific computing (numerical linear algebra, differential equations, etc.). For example, what's a Newton-Krylov IMEX integrator in Python? Boundary value DAEs? I know of libraries for these things in Fortran, C++, and Julia... but not Python. It's also well-known that Python lacks a lot of the statistics libraries of R. When you chart it out, Python tends to just have the bare minimum of support in every area (except ML, it has good ML libraries), which if it's what you need, great! But...
Are all the plotting/visualization options still half baked?
I've found Plots.jl and PyPlots.jl to work well for most basic things, despite not always being entirely pleasant to use, for example the compilation time issue, but this should hopefully improve. The only real problem I had is that these are not quite sufficient for plots to be published in a paper, many visual tweaks you might want are broken or terribly documented, and I have to just use matplotlib or R. It is generally great for jupyter notebooks though. I see the current deficiencies as highlighting just how much work went into matplotlib and others to get where they are today (and even mpl is in some ways still lacking, for example 3D surfaces and meshes). It is unfortunate though, as plotting is a core functionality for their main target of computational science. But to answer your question, mostly yes. Everything seems to be slowly improving though.
No matter the tool I use these days for plotting, I export it as a .tex file to use PGFPlots. matlab2tikz, matplotlib2tikz, and the savefig function in Plots.jl all do the job (with the pgfplots backend). This way you can tweak the figure in the final document, which I prefer. You can adjust all of the properties of the plot in Latex.
Yes, they are. Slow and hardly as expressive or rich as python/r counterparts.
One can use matplotlib in Julia by PyCall'ing it. So it is at least as good as anything else.
Or ggplot2 using RCall, which is what I use and it's quite nice.