Hacker News new | ask | show | jobs
by bnprks 827 days ago
One of the wildest R features I know of comes as a result of lazy argument evaluation combined with the ability to programmatically modify the set of variable bindings. This means that functions can define local variables that are usable by their arguments (i.e. `f(x+1)` can use a value of `x` that is provided from within `f` when evaluating `x+1`). This is used extensively in practice in the dplyr, ggplot, and other tidyverse libraries.

I think software engineers often get turned off by the weird idiosyncrasies of R, but there are surprisingly unique (arguably helpful) language features most people don't notice. Possibly because most of the learning material is data-science focused and so it doesn't emphasize the bonkers language features that R has.

9 comments

I saw a funny presentation where Doug Bates said something like: "This kind of evaluation opens the door to do many strange and unspeakable things in R... for some reason Hadley Wickham is very excited about this."
Unspeakable horrors like changing `$[` in old Perl5 versions to mess with someone's mind? Who doesn't like array indices starting at 0, 1, ... or 42?
In Dyalog APL you can set the index origin with ⎕IO←0 (or 1) and there are many ways in which this can bite you. In Lua, and I think Fortran, you can specify the range of array indices manually.
One of the stranger behaviours for me is that R allows you to combine infix operators with assignments, even thou there are no implemented instances of it in R itself. For example:

  `%in%<-` <- function(x, y, value) { x[x %in% y] <- value; x}

  x <- c("a", "b", "c", "d")
  x %in% c("a", "c") <- "o"
  x
  [1] "o" "b" "o" "d"
Or slightly crazier:

  `<-<-` <- function(x, y, value) paste0(y, "_", value)

  "a" -> x <- "b"
  x
  [1] "a_b"
We with Antoine Fabri created a package that uses this behaviour for some clever replacement operators [1], but beyond that I don't see where this could be useful in real practice.

[1]: https://github.com/moodymudskipper/inops

For those who haven't run into anything about this corner of R before:

https://blog.moertel.com/posts/2006-01-20-wondrous-oddities-...

That sounds like asking for trouble. Someone coming from any other programming language could easily forget that expression evaluation is stateful. Better to be explicit and create an object representing a expression. Tell me, at least, that the variable is immutable in that context?
The good news is that most variables in R are immutable with copy-on-write semantics. Therefore, most of the time everything here will be side-effect-free and any weird editing of the variable bindings is confined to within the function. (The cases that would have side effects are very uncommonly used in my experience)
The whole magic is that expressions are in fact just objects in the language. And no, there aren't any immutable bindings here.
It's crazy how literally R takes "Everything's an object." While parentheses can be treated like syntax when writing code, it's actually a function named `(`.

Of course, playing with magic sounds fun until you remember you're trying to tell a computer to do a specific set of steps. Then magic looks more like a curse.

Asking out of lack of experience with R: how does such invocation handle case when `x` is defined with a different value at call site?

In pseudocode:

  f =
  let x = 1 in # inner vars for f go here
  arg -> arg + 1 # function logic goes here

  # example one: no external value
  f (x+1) # produces 3 (arg := (x+1) = 2; return arg +1)

  # example two: x is defined in the outer scope
  let x = 4 in
  f (x+2) # produces 5 (arg := 4; return arg + 1)? Or 3 if inner x wins as in example one?
If the function chooses to overwrite the value of a variable binding, it doesn't matter how it is defined at the call site (so inner x wins in your example). In the tidyverse libraries, they often populate a lazy list variable (think python dictionary) that allows disambiguating in the case of name conflicts between the call site and programmatic bindings. But that's fully a library convention and not solved by the language.
Well the point is that the function can define its own logic to determine the behaviour. Users can also (with some limits) restrict the variable scope.
A lot of the time you're not actually using what is passed to the function, but instead the name of the argument passed to the function (f(x), instead of f('x')). Which, helps the user with their query (dplyr) or configuration (ggplot2).
> I think software engineers often get turned off by the weird idiosyncrasies of R

That was at least true when I was looking at it. I didn't get it, but the data guys came away loving it. I came away from that whole experience really appreciating how far you can get with an "unclean" design if you persist, and how my gut feeling of good (with all the heuristics for quality that entails) is really very domain specific.

I once needed to implement an API in R, just saying that having three or four object oriented systems did not help at all.
I had a colleague at Google who used to say: "The best thing about R is that is was created by statisticians. The worst thing about R was that it was created by statisticians."