Hacker News new | ask | show | jobs
by rlh2 1238 days ago
The obsession with cpu speed almost always confuses me in these topics. Time it takes to program is way more important, and that’s where a terse language like R shines. The base/most common functions are almost always executing C anyway. It’s kind of like lisp in that it’s easy to write slow code, but who cares if it’s “fast enough”? Also, it’s almost always easy to speed up if necessary at the R level and R’s C API is also easy to use for for numeric computing/optimization which is exposed at the C level if you want to use it.
4 comments

It depends. Take for example any omic dataset where you might need to run a GLM model on ~500,000 rows. Codes I've seen for this operation can range in time from taking 30 minutes to 2 days.

My take away here is that, sure, for one operation the speed is not that critical, but there is always the case where that one operation will be used close to a million times in one analysis and then it all adds up. On top of that if it's implemented in C then the invocation from R to C and back will be happening that many times which adds to the slowness.

Yes, I use R, Julia, and Python from time to time depending on the case and my mood and they all have their advantages and disadvantages.

R is more than fast enough for straightforward prototypical analyses where a lot of the code is calling C or something lower level and you're not introducing something "new" to the interpreter system. But if you want to do some unusual optimization there's going to be something that bottlenecks everything unless you go into C/C++/Fortran yourself, and then Julia is a good compromise. I've had times when Julia didn't save any time whatsover, and other times when it took something that would literally run over a week at least in R and it was done in 30 minutes in Julia.

Having said that, the more I use Julia the more I find myself scratching my head about it. It's very elegant but it's just low-level enough that sometimes I wonder if it's worth it over, say, modern C++ or something similarly low level, which tends to have nice abstracted libraries that have accumulated over the years. I also have the general impression, mentioned in a controversial post discussed here on HN, that a lot of Julia libraries I've used just don't quite work for mysterious reasons I've never been able to figure out. Everything with Julia has gotten better with time but I still have this sense that I could put a lot of time into some codebase, and have it just hit a wall because of some dependency that's not operating as documented.

There's kind of an embarrassment of riches in numerical computing today, and yet I still have the feeling there's room for something else. Maybe that's the mythical golden language that's lured all sorts of language developers since the beginning though.

I have been thinking the same and had similar timing experiences. As Julia is lower level than R/Python, there is a lot of annoying things to take care of that are not needed in R/Python. And then why not use, say Rust? Or just Rcpp in R. We just did a small test program in Rust that is called very often on the command line and takes a couple of seconds to run. Very happy with the experience. Same run speed as Julia, 10 times faster than R/python, and no 60 second load time like julia.
Julia 1.9, now in beta, implements native code caching. Precompiling a Julia package now creates a native shared library, a ".so", ".dylib", or ".dll" file. For some packages, this lowers load time considerably. It may some time before many packages take full advantage of this.

The promise of Julia is that you can have the high-level interface and the low-level code in the same language. The alternative would be coding the low level code in Rust or C and then creating bindings for Python or R.

For a while Julia made the most sense for long-running code that is that is executed almost as often as it is modified (e.g. scientific computing). In this situation Rust or C static compilation times become a hinderance. As ahead-of-time and static compilation features get added to Juliaz this scope will expand.

Yes I follow this. The load time keeps getting better. And am looking forward to 1.9.

I really don't want to come across as negative, Julia is a fantastic language, and my hope is that that it will continue its impressive improvement path.

But to follow form the thread's sentiment, I have the feeling Julia lives in an unstable equilibrium. It is lower level than R/Python but doesn't quite deliver the benefits of rust/c/fortran/c++. I find my colleagues gravitate to one of the 2 equilibria.

Maybe your last paragraph crystallizes it. If one lives in the REPL, Julia is wonderful. Not how I work. I prefer the command line. Have new data, run code on it. Data changes in real time, code not. My code may run millions of times on different operating systems and only infrequently change.

We already have some forward prototypes of being able to run Julia ahead-of-time compiled native code from the command line.

https://github.com/brenhinkeller/StaticTools.jl

I think what we'll end up with is a language that can be used in both a fully static mode and in a dynamic mode along with some possible mixing. We may yet get the benefits of a statically compiled language as the tooling continues to develop. I do not see anything inherent in the language that would prevent that from happening.

In Julia you can go low-level, but there is no requirement. You can write purely high-level, generic, untyped code, with good performance. So I'm a bit reluctant to accept the claim that it's lower level.

What are the things where low-level code is required in Julia, but not in Python/R?

One of the key points of Julia is that the language you use for performance critical parts is also Julia. That applies to both the libraries like DataFrames.jl and for situations where you'd drop to a lower level language when optimising. I think being productive in Fortran or C++ is unrealistic for most scientific programmers.
It is a trade-off and a sweet spot has a lot to do with the specific context and background. Run speed matters a lot when the difference is between having to run your code on a dataset for half an hour vs through the whole night. Once you have prototyped your code, you are gonna use it more and more (not to mention runs in order to tweak parameters or validate results), and R's speed is not satisfying enough for my work. Python matlab are easy and fast enough to program in, and much faster for tasks that are computing-heavy. If I was getting into C I would not have saved as much time as I would have put into learning how run eg parallel tasks there safely. Moreover, R is not necessarily faster to program, always; real (ie tidyverse-style) R is quite idiosyncratic, if you come from a programming and not from a statistics background probably it will take more time to learn than it is worth unless it is sth important in your work environment.
When someone understands what is happening when their program executes they will write faster programs without much more effort.

You might like writing slow programs, but that doesn't mean people like using them.