|
The two-language problem is well-known. People wanted performance, which is reserved to languages like C, C++ or Java, but they didn't want to use these languages, since they are objectively ugly and a pain to write. Thus, languages like Python were born, but we were warned that they were going to be slow because something something dynamic typing something something the compiler can't optimize blah blah blah. And so we were told to avoid doing too many loops, or load too many objects in memory, or indeed even attempt push the language to match one's actual use cases, because Python wasn't well-built for it. But in the meantime, languages like R or Matlab had figured a solution: write all the heavy-lifting ultra-optimized algorithms in C or Fortran or some equally ugly language that no one but really smart nerds wants to touch, and wrap it in a semantic that makes loops and loading many objects unnecessary, called 'vectorized operations'. In R, for instance, you think you're manipulating mere strings or logicals, but you're in fact manipulating vectors of length 1 and of type 'string', 'logical', etc. But doing operations on vectors or arrays became as seamless as doing them with mere scalars, with hardly any loss in performance. And so the R world thrived, although we were still cautioned to use weird lapply/sapply/rapply magic instead of doing proper loops because something something compiler something something slow blah blah blah. And so the Python world saw that the R and Matlab world thrived, and wondered if they could do the same. A bunch of really smart nerds sat down with their laptops and wrote a bunch of ultra-optimized algorithms in one of those ugly languages no one else wants to touch, and lo, in the mid-2010s Python had finally achieved feature parity with R and Matlab twenty years ago. Yet the trend showed no sign of slowing, as Python was not only useful for scientific computing, but many other use cases as well (you ever tried to write an interface or webserver in R?), and sometimes researchers have the audacity to want to do several things at once with the computer. And so Python achieved its present ubitquity in data science. There's trouble in paradise, however. As with R, we were cautioned to avoid doing too many loops because something something you know what I mean, and instead use vectorized operations. And little by little, we had to learn every day a little more of numpy's arcane API, the right magical formulas to invoke in order to avoid losing performance. We had to learn which operations are in-place and which ones create a new array (knowing this could change over multiple versions), which appropriate slicing and indexing to use, which specific functions to call. And the more our use cases deviated from the documentation, the more magic we had to learn. At some point we had to learn obscure methods beginning with an underscore, or even (the horror!) mind whether arrays were ordered C-style and Fortran-style, or even told to use Cython (!), nevermind your desire to absolutely avoid touching these languages in any way. May Allah be with you should you ever want to manipulate sparse data. Aware that the community had to learn magic whose complexity on par with the ugly languages they'd sworn off, really smart nerds took it upon themselves to... write more magic in order to avoid writing the older magic. And so we got dask, which is as powerful as it is painful to use. We got numba, which seems to work automagically in the official demo snippets and zilch in your own. 'That's because you're using them wrong', the smart people tell you on stackoverflow. 'Teach me how to use them right', you beg. And so your mental spellbook thickens with no end in sight... Enter Julia. Julia doesn't have that any of the above dillemas, because Julia is fast. Julia doesn't care whether you vectorize or write loops, but you can do either. Julia doesn't force you to declare types, but you can if you really want to. Julia doesn't require you to write advanced magic to do JIT compilation. Julia doesn't see itself as an R or Python competitor: why, Julia loves Python and R, and in fact you can just call one from the other if you feel like it! Go on, just RCall ggplot on an array created with PyCall("numpy"), it just works! Julia was built with parallel computing and HPCs in mind, so no need to fiddle with dask boilerplate when it just works with @macros. Julia knows programmers are afraid of change, so it syntax is really, really close to Python's. Julia has a builtin package manager. Julia lets you use the GPU without having to sacrifice a rooster to Baal every time you want to install CUDA bindings. Of course Python isn't going anywhere, just like R is still going strong even after Python 'displaced' it. And of course, Julia's ecosystem is smaller (but growing), its documentation is lacking, it doesn't have millions of already answered questions on Stackoverflow...but if you know where the wind blows, you know where the future is headed, and its name rhymes with Java. |
That's my experience. I was working on satellite image processing at the time, lots of Python loops in the code. Numba should've made a big difference according to the demos, but when benchmarking it didn't.
Adding a single decorator sounds wonderful and I never found an answer to why it didn't work.