| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by throwaway33339 2097 days ago

The two-language problem is well-known. People wanted performance, which is reserved to languages like C, C++ or Java, but they didn't want to use these languages, since they are objectively ugly and a pain to write. Thus, languages like Python were born, but we were warned that they were going to be slow because something something dynamic typing something something the compiler can't optimize blah blah blah. And so we were told to avoid doing too many loops, or load too many objects in memory, or indeed even attempt push the language to match one's actual use cases, because Python wasn't well-built for it.

But in the meantime, languages like R or Matlab had figured a solution: write all the heavy-lifting ultra-optimized algorithms in C or Fortran or some equally ugly language that no one but really smart nerds wants to touch, and wrap it in a semantic that makes loops and loading many objects unnecessary, called 'vectorized operations'. In R, for instance, you think you're manipulating mere strings or logicals, but you're in fact manipulating vectors of length 1 and of type 'string', 'logical', etc. But doing operations on vectors or arrays became as seamless as doing them with mere scalars, with hardly any loss in performance. And so the R world thrived, although we were still cautioned to use weird lapply/sapply/rapply magic instead of doing proper loops because something something compiler something something slow blah blah blah.

And so the Python world saw that the R and Matlab world thrived, and wondered if they could do the same. A bunch of really smart nerds sat down with their laptops and wrote a bunch of ultra-optimized algorithms in one of those ugly languages no one else wants to touch, and lo, in the mid-2010s Python had finally achieved feature parity with R and Matlab twenty years ago. Yet the trend showed no sign of slowing, as Python was not only useful for scientific computing, but many other use cases as well (you ever tried to write an interface or webserver in R?), and sometimes researchers have the audacity to want to do several things at once with the computer. And so Python achieved its present ubitquity in data science.

There's trouble in paradise, however. As with R, we were cautioned to avoid doing too many loops because something something you know what I mean, and instead use vectorized operations. And little by little, we had to learn every day a little more of numpy's arcane API, the right magical formulas to invoke in order to avoid losing performance. We had to learn which operations are in-place and which ones create a new array (knowing this could change over multiple versions), which appropriate slicing and indexing to use, which specific functions to call. And the more our use cases deviated from the documentation, the more magic we had to learn. At some point we had to learn obscure methods beginning with an underscore, or even (the horror!) mind whether arrays were ordered C-style and Fortran-style, or even told to use Cython (!), nevermind your desire to absolutely avoid touching these languages in any way. May Allah be with you should you ever want to manipulate sparse data.

Aware that the community had to learn magic whose complexity on par with the ugly languages they'd sworn off, really smart nerds took it upon themselves to... write more magic in order to avoid writing the older magic. And so we got dask, which is as powerful as it is painful to use. We got numba, which seems to work automagically in the official demo snippets and zilch in your own. 'That's because you're using them wrong', the smart people tell you on stackoverflow. 'Teach me how to use them right', you beg. And so your mental spellbook thickens with no end in sight...

Enter Julia. Julia doesn't have that any of the above dillemas, because Julia is fast. Julia doesn't care whether you vectorize or write loops, but you can do either. Julia doesn't force you to declare types, but you can if you really want to. Julia doesn't require you to write advanced magic to do JIT compilation. Julia doesn't see itself as an R or Python competitor: why, Julia loves Python and R, and in fact you can just call one from the other if you feel like it! Go on, just RCall ggplot on an array created with PyCall("numpy"), it just works! Julia was built with parallel computing and HPCs in mind, so no need to fiddle with dask boilerplate when it just works with @macros. Julia knows programmers are afraid of change, so it syntax is really, really close to Python's. Julia has a builtin package manager. Julia lets you use the GPU without having to sacrifice a rooster to Baal every time you want to install CUDA bindings.

Of course Python isn't going anywhere, just like R is still going strong even after Python 'displaced' it. And of course, Julia's ecosystem is smaller (but growing), its documentation is lacking, it doesn't have millions of already answered questions on Stackoverflow...but if you know where the wind blows, you know where the future is headed, and its name rhymes with Java.

4 comments

pbowyer 2097 days ago

> We got numba, which seems to work automagically in the official demo snippets and zilch in your own. 'That's because you're using them wrong', the smart people tell you on stackoverflow.

That's my experience. I was working on satellite image processing at the time, lots of Python loops in the code. Numba should've made a big difference according to the demos, but when benchmarking it didn't.

Adding a single decorator sounds wonderful and I never found an answer to why it didn't work.

link

VHRanger 2097 days ago

There's a learning curve. It only works when the `@jit(nopython=True)` doesn't reject compilation.

Otherwise it's no better than python.

Generally numba works the same way writing C works: you pass in raw buffers (numpy arrays) and do processing directly on those buffers. That compiles to good LLVM bytecode and is fast.

link

pbowyer 2096 days ago

> There's a learning curve. It only works when the `@jit(nopython=True)` doesn't reject compilation.

Is there a way to get it to tell you when it rejects compilation? I don't recall one at the time (this was ~4 years ago) and spent a week and a bit trying to get it to work.

link

lasagnaphil 2097 days ago

One gripe I had with the Julia language: incredibly slow startup. Although it was touted to be good for quick interactive scripting, the time it takes for the JIT compiler to compile packages and scripts was a huge dealbreaker for me. (I expected simple operations such as plotting a line to be done in less than a second, but found myself waiting minutes for the plotting library to compile.) This really threw me off and prevented me from further exploring the language. Maybe I might consider trying it again when the situation is better.

The main culprit seems to lie in the fact that the LLVM JIT compiler isn't great in terms of performance, although it does compile Julia to really efficient native runtime code. For example, you don't have problems like this in LuaJIT: although the JIT does less thorough optimizations than Julia (hence probably slower and more unpredictable runtime performance), it wins in usability by having a really fast non-compiled interpreter path written in assembly (because Mike Pall is a robot from the future). Obviously Lua is quite different in the fact that it's fully dynamically typed while Julia has a static type system, and note in mind that Lua's design had some major flaws as a scientific language (hence most ML researchers moving from LuaTorch to PyTorch.)

link

kgwgk 2097 days ago

> if you know where the wind blows, you know where the future is headed, and its name rhymes with Java.

https://www.rhymes.net/rhyme/java

link

pbowyer 2097 days ago

I have no clue where the rhyming future is headed. Anyone solved this riddle?

link

peteradio 2097 days ago

Yea but Julia indexes from 1 ... HARD PASS!

link

cultus 2097 days ago

That is the standard in scientific computing languages, as well as math. Fortran and Matlab are both 1-based.

link