Hacker News new | ask | show | jobs
by ryndbfsrw 1973 days ago
If we removed dplyr, then R scripts would absolutely scream so I find the speed argument for 'why switch to X' unconvincing. If users cared so deeply about speed, almost no one would be using tidyverse instead we'd all be using base-R or data.table.

Multiple dispatch? Hmm is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out. If the goal of Julia is to replace R/Python then their priorities feel way off the mark

2 comments

> If the goal of Julia is to replace R/Python then their priorities feel way off the mark

There's a lot more to scientific computing than wrangling tabular data. Julia is competing in that overall space with R/Python/Fortran/Java/C++. If R or Pandas is better at data wrangling, then Julia won't win out there. But so be it. No PL is best at everything.

> There's a lot more to scientific computing than wrangling tabular data.

Also a point that gets ignored way too often. My original post differentiated between time spent writing models and time spent data wrangling.

I would never even attempt to write a symplectic integrator in base R (OK maybe Rcpp would be fine but that's not really "R"). Julia, by design, is better at that. But the R ecosystem is so good that I can use the best practical implementation of a symplectic integrator to solve common modeling problems via RStan.

Yes, Stan is a standalone framework that can be accessed from Julia as well. But the following workflow can be done in R much easier:

  1) Read in badly formatted CSV data
  2) Wrangle the data into a useable form
  3) Do some basic exploratory analysis (including plots)
  4) Write several models in brms/raw Stan (via rstan)
  5) Simulate from the priors and reset them to more sensible values
  6) Run the model over the data to generate the posterior
  7) Plot/run posterior predictive checks, counterfactual analysis, outlier analysis (PSIS or WAIC), etc.
Again, the above represents my common use case. I fully appreciate that people use Julia to do awesome stuff like "the exploration of chaos and nonlinear dynamics." [0]. I understand that the modern R ecosystem isn't really built for this.

[0] https://juliadynamics.github.io/DynamicalSystems.jl/latest/

Totally agree there. It is not a replacement and it is trying to solve a different problem. I dont believe Julia contributers are lying awake at night upset that other languages exist and feel they need to put a stop to that. My point (put across clumsily I see) is that IF that was their goal then they are going about it the wrong way as most R/Python users have different priorities. But it is a moot point as that would be an absurd motivation to create a whole new language
> is this really a problem that I'm going to come across in the real-world when 90% of our time is spent ingesting a poorly-formatted csv, doing some quick plots and perhaps building a model to test something out

Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

Also, why shouldn't dplyr perform comparably against data.table? Seems like there would be no need for a fragmented library ecosystem here if the abstractions the tidyverse is built upon were lower-cost. Moreover, what if my data isn't CSV or in a table-like shape at all? "real world" does not mean the same thing across different domains.

[1] http://docs.juliaplots.org/latest/recipes/

> Yes, multiple dispatch is not some highfalutin ivory tower concept that only comes up in specialized code. For example, the model in question could define custom plotting recipes[1] so that you can just call plot() and have it produce something useful.

This is literally the whole conception behind generic functions in R (print, plot, summary etc).

I agree it's great, but Julia is building on a lot of prior art here.

For sure, and one would be remiss not to mention Dylan, CL/CLOS and Clojure here as well. My quibble was with the claim that multiple dispatch rarely shows up in practice, which you've pretty clearly shown is not the case in R!
Yup, the R-FAQ specifically calls out Dylan and CL as influences.
'highfalutin ivory tower' is a great name for a band :D

Naturally you are correct and I am wrong to dismiss it as unimportant. What I'm saying is that the majority of R/Python users today are not looking for ultimate speed or sophisticated programming paradigms. Most users are doing the unsexy bread and butter of 'Take some tabular data' -> analyse -> report on it and I want to dismiss the argument of 'users will migrate to Julia because of these nifty features' because it ignores the very reasons the existing users use these tools in the first place. It would be as absurd as proclaiming Excel users will switch to Python because the accounts deparment suddenly cares about NLP.