Hacker News new | ask | show | jobs
by orhmeh09 746 days ago
OOP has been a critical part of real-life R for a long time, especially in complex implementations of classes of kernels, algorithms, and so on with S4, and more general-purpose with R6. Without these frameworks it would be difficult to implement them.

Personally I find it more expressive for general-purpose computation than Python. The "fs" library is much better at working with files and paths than Python "os" and the multiple other modules that can be needed to work with with typical filesystem operations -/ especially if you are working with more than one file at a time.

I would even say that each of the R object systems is more expressive and more flexible than the Python one. I suspect lazy evaluation is a part of this.

5 comments

Maybe too expressive and too flexible? Different R programs can have wildly different dialects, making it difficult for two R programmers to even understand each other.

I’ve seen comics depicting the learning curve for R as having local minima beyond which there are further peaks and troughs of knowledge. A beginner might learn enough to get by, but find the code of someone on the other side of one of those peaks to be a foreign language.

Having your coders not understand each other is problematic in a production environment.

This is fair. For what it's worth, Python is tending toward this and I think is introducing newer syntax at a faster rate with things such as structural pattern matching and typing, which I have had difficulty explaining to people who don't keep up with each new release.
Switching from R to Python, this resonates. I write base (non-tidy) R, and it's definitely another language from tidy. That said, having written a fair amount of base Python, jumping into torch/tensor flow feels like an even further separation than base R/tidy.
Pathlib gives a decent oop interface to most file operations. The pathlib docs has a handy Rosetta Stone table for replacing `os` and `shutil` invocations which is very handy.

It could still be better, and sadly being in the stdlib means it probably won't be improved.

For example, we still need shutil.rmtree to recursively nuke a directory. The pathlib way of doing it is laborious and error prone: https://stackoverflow.com/questions/13118029/deleting-folder...

Pathlib is a great addition and it's nice that it's in the standard library. I like fsspec also, which has some functionality that overlaps with Python's existing libraries but makes it a little cleaner IMO.
I too much prefer R to Python (it’s far more expressive, for one) however it’s clear now that Python has “won” in this space and R is a tough sell to a wider team.
> it’s clear now that Python has “won” in this space and R is a tough sell to a wider team

R has always been a language for academics, and it continues to be popular in that domain, with no compelling reason to switch. It has seen usage in the private sector, but that has never been the driving force behind R's development or ecosystem, and I doubt it ever will be. For academics, even if a particular function is only available for Python, it's easy enough to call it from R and do everything else in R.

My experience is that you can "sell" R when the statistical or modelling technique is not (or not well) implemented in python. Which still includes a lot of potentially useful statistical techniques! R should/could lean into being the tool that python programmers reach for when they need something a little less mainstream, if they made it easier for non-R programmers to do so.
Oh, I agree on that, and I know that most of the time I'll be asked why something isn't in Python so I mainly reserve it for when I am sure nobody will ask "why R?" :-)

There is a benefit nowadays that I can rely on Python>=3.6 to be available by default anywhere I am deploying whereas R has to be installed in some way, so like Bash it's part of a toolbox I can rely on being available with at least a constrained set of features.

> I suspect lazy evaluation is a part of this.

I had no idea R was lazy. Makes me wanna learn it now.

R is such a weird little language. It's basically lazy Lisp dressed up in C syntax.

For example, the only operator it really has is a function call. Everything else is syntactic sugar for a function call, and I mean literally everything: assignments, conditionals, loops, even function definitions and curly braces are all function calls. For example:

   a <- 1;
   # is the same as
   `<-`(a, 1);

   if (a == 1) print("ok"); else { print("wtf"); print(a); }
   # is the same as
   `if`(a == 1, print("ok"), `{`(print("wtf"), print(a)));
 

   function(x, y=42) x + y;
   # is the same as 
   `function`(pairlist(x=, y=42), x + y);  # not quite but close enough
and so on. You can actually see what things look like under the hood for any R expression or statement by printing as.list(quote(...)), and recursively doing that for every element in the resulting list

The reason why this is possible is because all arguments in R are not evaluated when passed to a function. Instead, it receives the expression object corresponding to the expression that the caller used for that argument, combined with the environment in which it was created - R calls this a promise. It's kind of like instead of (foo (+ x y)), you'd write:

   (foo (`(+ x y) (lambda () (+ x y)))
i.e. for the argument, instead of its evaluated value, you passed both the quoted expression and the lambda that computes it in the original environment. When the actual value of the argument is needed, the expression is evaluated and the result is cached in the promise (so implicit eval is lazy and one-off). But the function can instead just query for the argument expression directly and then use it in some other way - so e.g. the `<-` function does not eval its first argument, but instead uses it to identify the variable being set.
Thanks for a great explanation. It looks like the thing Scala does with its by-name parameters, but for every parameter by default. Even closer analogy, I think Io works in a very similar way - bodies of methods can access their arguments as Message (ie. unevaluated calls) objects and then decide to evaluate them as needed (which differs from your example in that the body can choose the context in which the message send is to be executed, it doesn't have to be lexical scope of a caller). It enables a great deal of expressivity - esp. coupled with some syntactic sugar for "operators" - and I always wondered why more languages don't have that feature.
In R, you can also choose the context in which the argument expression is to be evaluated. If you just use the promise as if it were a value and rely on implicit evaluation, then it happens in the context of the caller, yes. But environments (i.e. sets of name-value bindings) in R are first-class objects, so they can be captured at any given point, and later used to explicitly evaluate promises after retrieving the latter's associated expression.

   foo <- function(x, env) {
     print(x);  # implicit eval
     
     x_expr <- substitute(x);  # gets the associated expression
     print(eval(x_expr, env));  # explicit eval in different environment
   }

   bar <- function(y) {
     environment()  # capture and return local environment of function
   }

   y <- 1
   env <- bar(2);
   foo(y * y, env);  # prints 1 then 4.
Side note: substitute() seems like a weird name for a function that returns the underlying expression of the promise. It's named that way because it's actually similar in intended use to quasiquotation - it lets you explicitly substitute variable names for something else in the expression before evaluating it. So e.g. substitute(x <- x + 1, x=2) returns the expression object for (1 <- 1 + 2). Not passing any named arguments is just a special case where no substitutions are made and the original expression is returned instead, although in practice that is probably the most common way to use it.
Oh, I loved using Io in the late 2000s (and did all my algorithms assignments in it, probably to the chagrin of the instructor who allowed us to use any language we wanted). Maybe that explains some of my affinity toward R. Is there anything else useful that is similar to it these days?
R is not lazy. It has non-standard evaluation mechanisms (formulas, promises, quosures...) that enable to you to write domain-specific languages that "do what the user meant".

If your code (or the code of the libraries you're using) doesn't use any non-standard evaluation tools, evaluation will be eager and work like any other ALGOL language.

It is possible to make some objects behave in a lazy way, but this is also true of many other languages.

R is lazy, because it does not evaluate arguments upon function call unless and until used. It is also unusual in that the function can avoid evaluating the argument at all, and instead ask for the quoted expression that produced it (which can then be evaluated manually at the desired point, or multiple times, or in a different environment etc), but that is orthogonal to laziness.

To be even more precisely, R itself is lazy "all the way through". Because literally every expression in R is syntactic sugar for a function call (including assignments and control structures such as "if"), the only thing that a function can really do with an argument is pass it on to another function, so, strictly speaking, there's no distinction between use and non-use even. It's just that any R function, in order to do something useful, will ultimately call some non-R leaf function implemented in native code, and some of those leaf functions will actually do the eval if they're defined in terms of argument values (e.g. obviously addition needs to do so to actually compute the value etc).

R's evaluation of arguments is lazy, so while not at the level of Haskell it feels like a lazy language to me. Try eg:

  f = function(x) { print('hello'); x }
  f(print('world'))
X is not evaluated in f until referenced. Indeed if you remove x from f, world is not printed.
Apologies, my bad, but I'm a bit too late to edit. The experts say that R qualifies as a lazy language [1, 2].

My impression was that R was mostly an eager language that somehow allowed for laziness. I will research this further and hopefully suss out why I got confused.

[1] https://dl.acm.org/doi/10.1145/3360579

[2] https://www.r-bloggers.com/2018/07/about-lazy-evaluation/

Thank you for the correction. Is it possible to use NSE in say Python or JavaScript?
Yes, though the languages do not support it explicitly you can simulate lazy evaluation by wrapping all your arguments in closures. This way they won't be evaluated until called within the function body.
Sidenote, the evaluation model of python can be surprising. List comprehension will create implicit function scopes that can trip you up.
In modern idiomatic Python, you should really be using `pathlib` and `io`, not `os`.
Yeah, you're absolutely right. The `os` methods are closest to the base R file operations and those in `pathlib` are closest to `fs` (which is a third-party library for R that requires installation).