Hacker News new | ask | show | jobs
by wenc 2017 days ago
I'm sure some of us who are out of the loop might be wondering: what about the magrittr pipe operator (%>%) that we all know and love?

Luke Tierney explains the move from %>% to a native pipe |> here [1]. The native pipe aims to be more efficient as well as addresses issues with the magrittr pipe like complex stack traces.

Turns out the |> syntax is also used in Julia, Javascript and F#.

The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

[1] https://www.youtube.com/watch?v=X_eDHNVceCU&feature=youtu.be...

9 comments

In the case anyone is curious about the origin of the (|>) pipeline symbol:

Although F# is its most well known early popularizer, it originated in Isabelle/ML, 1994, proposed by Tobias Nipkow.

Here is a blog post by Don Syme which embeds the email thread of its invention: https://web.archive.org/web/20190217164203/https://blogs.msd...

It's a fascinating look through time.

Of course, I should note this is the history for the pipe-forward operator for chaining (reverse) function application used in a programming language. The general concept is even earlier, as attested by the shell syntax for chaining anonymous pipes https://en.wikipedia.org/wiki/Pipeline_(Unix)#History.

Metanote: I was surprised I was unable to find an answer to who invented the (|>) pipe syntax through google. I could only find this Elm thread https://elixirforum.com/t/which-language-first-introduced-th... which got close but did not have the answer. I am therefore writing this here to hopefully surface it for future searches and "question answering AIs".

Woah, I hadn't known that!

And given that I'm currently staring at Isabelle code most of the day for my Master's thesis at the chair of Prof. Nipkow, it's sightly surreal to learn about this here, heh.

Thanks, the video helped explain some things, along with this post from the R-devel list:

https://stat.ethz.ch/pipermail/r-devel/2020-December/080173....

The reason for announcing the new lambda syntax at the same time seems to be to enable certain workflows that the magrittr pipe supports. The %>% operator, by default, pipes to the first argument of a function. If you want to pipe to a different argument, you can do:

a %>% func(x, arg2 = .)

It seems like the native pipe doesn't support a placement argument, but you can use the new, more concise lambda operator:

a |> \(d) func(x, arg2 = d)

A little more verbose, but it's not a very common use case, it's more general, and I'd happily trade a little more verbosity for the rest of the improvements. (That said, I haven't played around with the magrittr 2.0 improvements yet, so maybe the difference is going to end up being less than the presentation suggests.)

The use of "." as an argument is actually probably one of my most common wtf's with pipes in general.

I tend to use it a lot if I'm just piping a vector to base functions (gsub/grep have x as their third argument.

This syntax looks like it makes that a little harder, but the new error messages are going to make everything so much better that I'm totally fine with it.

It is particularly infuriating in R, because

    lm(y ~ ., data = my_dataframe)
already means "regress the variable y on all other columns in `my_dataframe`." For big, interactive regresions, it's really natural to write

    my_original_dataframe %>%
        do_a_bunch_of_tranformations() %>%
        select(...) %>% # Pull out just the columns you want
        lm(y ~ ., data = .)
and god knows how that last line is going to be interpreted. So disambiguating through some mechanism is necessary anyway. A lambda is much better than some temporary variable that just holds the formula `y ~ .`.
The zfit package is intended to address this issue, with the zlm() and comparable functions that are very thin wrappers around lm() and friends. The ony thing they do is flip the argument order so the data comes first, making exactly this use case much simpler. So you can do:

    cars %>% zlm(dist ~ speed)
(or now)

    cars |> zlm(dist ~ speed)
https://github.com/torfason/zfit
Tbh, I would 1000% rather my coworkers write a lambda function or closure where it's necessary than add a new package depencency just to change the order of arguments in widely used functions.

Plus, I still wouldn't trust the code

    cars %>% zlm(dist ~ .)
to necessarily work the way I want, or to work the same way across package versions.
I think magrittr 2.0 has addressed that problem also.
|> is also used in Elixir where it is implemented as a macro so it’s a little less flexible since it can’t be assigned as a value.
> Turns out the |> syntax is also used in Julia, Javascript and F#.

Note that for JS it's still just a proposal and has been stuck in an indeterminable bikeshedding phase for most of this year.

Admittedly, the `|>` javascript syntax is complicated by unclear async behavior.

I'm excited for it, though, and if the partial application syntax `func(a, ?)` gets ratified then we'll have a nice concise way of describing operations.

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

Personally I think it would be a good idea if you could e.g. configure your keyboard so that AltGr+L produces λ, which you can then use in place of \

But alas, the Haskell community has decided against this:

https://gitlab.haskell.org/ghc/ghc/-/issues/1102

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

The proposed lambda syntax for R is `\(x) x+1` so `\` will just be shorthand for `function`.

The anonymous function change is probably a (small) mistake.

    function(x) {x + 1} 
is already logically equivalent to and from some perspectives an arguable syntax improvement on

    \(x) x + 1
Giving everyone two ways of doing one thing just means the tutorials will be fragmented and beginners even more confused.

Tierney mentioned that tidyverse found function(x) too verbose and uses fomula syntax. Given how tidyverse often uses the "y ~ x" formula notation, this might actually be picking up deficiencies in R's macro system rather than in the function notation and the problem got misdagnosed.

Having the option of writing

    \(x) x+1
instead of

    function(x) x+1
not only saves a few keystrokes.

It will also produce shorter, and clearer, lines of code.

What I don’t understand is the reference to “formula syntax”. What is the issue and how does the new syntax solve it?

Function syntax is this stuff [0, 1]. Tidyverse uses it to accomplish some non-model stuff. The one that leaps to my mind is faceting [2]. I'd expect that sort of thing to be handled by macros.

And all the rest of the comment I wouldn't have typed except I'm already replying, since I know this is one of those two-types-of-people-who-don't-change-opinions situations. But...

> not only saves a few keystrokes.

R is secretly a lisp. People can define whatever they want to be whatever they want. Pipes were already implemented in a library (try doing that in Python). Make your own library or bind \ to a keyboard macro or something if your fingers are on the point of crumbling under the stress of those 7 keystrokes.

Defaults using real words to describe things is good. The function to create a function being function() is eminently reasonable. \() is meaningless and about as useful as a one-word variable

> It will also produce shorter, and clearer, lines of code.

Opinons very much divide. Code length is only a proxy for load on a reader's short term memory which is what matters. \ is going to put more burden on someone if they aren't very familiar with R. Most R coders are not full time programmers and not very good at R.

[0] https://www.rdocumentation.org/packages/base/versions/3.6.2/... [1] https://www.rdocumentation.org/packages/stats/versions/3.6.2... [2] http://www.cookbook-r.com/Graphs/Facets_(ggplot2)/

I know what formulas are, I just don’t see what’s the connection with the proposed change:

     ‘\(x) x + 1’ is parsed as ‘function(x) x + 1’
I also know that R has some vestigial scheme under the hood, but the syntax is not lisp (it was taken from S). In common lisp one could easily use a macro or reader macro but in R a change in the parser is needed so “\” can be used instead of “function”.

Note that the existing “f <- function(...) ...” syntax is not being removed. But I write a lot of code like

    state.range <- apply(state.x77, 2, function(x) c(min(x), median(x), max(x)))
and it will be an improvement to be able to condense it a bit to

    state.range <- apply(state.x77, 2, \(x) c(min(x), median(x), max(x)))
If you’re worried about giving programmers too many options, R is already a nightmarish lost cause...
Thanks for clearing that up. I was wondering what I have been using all these years in lapply ...
Beautiful. - Elixir has this as well. Love it. - In Mathematica you'd have to write `data // f // g` to denote `g(f(data))`.

## Edit

Corrected a notation.

I don’t know Mathematica, but wouldn’t data // g // f make more sense for f(g(data)) ?
You‘re right. That was I typo.
if you use vscode this is an invaluable vscode snippet:

https://slickb.it/bits/70

Its only a proposal in JS. For long function chains i like to use intermediate variables as they make the code easier to understand.
Genuinely curious why not combine syntax? Do we need 2 different pipes in R? When to use which? Thanks for your thoughts!
wenc's comment (currently top) links to a video where luke tierney explains why the magrittr pipe is not optimal so they are looking for a native solution.