| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by wenc 2064 days ago

I'm sure some of us who are out of the loop might be wondering: what about the magrittr pipe operator (%>%) that we all know and love?

Luke Tierney explains the move from %>% to a native pipe |> here [1]. The native pipe aims to be more efficient as well as addresses issues with the magrittr pipe like complex stack traces.

Turns out the |> syntax is also used in Julia, Javascript and F#.

The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

[1] https://www.youtube.com/watch?v=X_eDHNVceCU&feature=youtu.be...

9 comments

Cybiote 2064 days ago

In the case anyone is curious about the origin of the (|>) pipeline symbol:

Although F# is its most well known early popularizer, it originated in Isabelle/ML, 1994, proposed by Tobias Nipkow.

Here is a blog post by Don Syme which embeds the email thread of its invention: https://web.archive.org/web/20190217164203/https://blogs.msd...

It's a fascinating look through time.

Of course, I should note this is the history for the pipe-forward operator for chaining (reverse) function application used in a programming language. The general concept is even earlier, as attested by the shell syntax for chaining anonymous pipes https://en.wikipedia.org/wiki/Pipeline_(Unix)#History.

Metanote: I was surprised I was unable to find an answer to who invented the (|>) pipe syntax through google. I could only find this Elm thread https://elixirforum.com/t/which-language-first-introduced-th... which got close but did not have the answer. I am therefore writing this here to hopefully surface it for future searches and "question answering AIs".

mckirk 2064 days ago

Woah, I hadn't known that!

And given that I'm currently staring at Isabelle code most of the day for my Master's thesis at the chair of Prof. Nipkow, it's sightly surreal to learn about this here, heh.

cwyers 2064 days ago

Thanks, the video helped explain some things, along with this post from the R-devel list:

https://stat.ethz.ch/pipermail/r-devel/2020-December/080173....

The reason for announcing the new lambda syntax at the same time seems to be to enable certain workflows that the magrittr pipe supports. The %>% operator, by default, pipes to the first argument of a function. If you want to pipe to a different argument, you can do:

a %>% func(x, arg2 = .)

It seems like the native pipe doesn't support a placement argument, but you can use the new, more concise lambda operator:

a |> \(d) func(x, arg2 = d)

A little more verbose, but it's not a very common use case, it's more general, and I'd happily trade a little more verbosity for the rest of the improvements. (That said, I haven't played around with the magrittr 2.0 improvements yet, so maybe the difference is going to end up being less than the presentation suggests.)

disgruntledphd2 2064 days ago

The use of "." as an argument is actually probably one of my most common wtf's with pipes in general.

I tend to use it a lot if I'm just piping a vector to base functions (gsub/grep have x as their third argument.

This syntax looks like it makes that a little harder, but the new error messages are going to make everything so much better that I'm totally fine with it.

grayclhn 2064 days ago

It is particularly infuriating in R, because

    lm(y ~ ., data = my_dataframe)

already means "regress the variable y on all other columns in `my_dataframe`." For big, interactive regresions, it's really natural to write

    my_original_dataframe %>%
        do_a_bunch_of_tranformations() %>%
        select(...) %>% # Pull out just the columns you want
        lm(y ~ ., data = .)

and god knows how that last line is going to be interpreted. So disambiguating through some mechanism is necessary anyway. A lambda is much better than some temporary variable that just holds the formula `y ~ .`.

torfason 2063 days ago

The zfit package is intended to address this issue, with the zlm() and comparable functions that are very thin wrappers around lm() and friends. The ony thing they do is flip the argument order so the data comes first, making exactly this use case much simpler. So you can do:

    cars %>% zlm(dist ~ speed)

(or now)

    cars |> zlm(dist ~ speed)

https://github.com/torfason/zfit

grayclhn 2063 days ago

Tbh, I would 1000% rather my coworkers write a lambda function or closure where it's necessary than add a new package depencency just to change the order of arguments in widely used functions.

Plus, I still wouldn't trust the code

    cars %>% zlm(dist ~ .)

to necessarily work the way I want, or to work the same way across package versions.

dash2 2064 days ago

I think magrittr 2.0 has addressed that problem also.

dugmartin 2064 days ago

|> is also used in Elixir where it is implemented as a macro so it’s a little less flexible since it can’t be assigned as a value.

crooked-v 2064 days ago

> Turns out the |> syntax is also used in Julia, Javascript and F#.

Note that for JS it's still just a proposal and has been stuck in an indeterminable bikeshedding phase for most of this year.

pkage 2064 days ago

Admittedly, the `|>` javascript syntax is complicated by unclear async behavior.

I'm excited for it, though, and if the partial application syntax `func(a, ?)` gets ratified then we'll have a nice concise way of describing operations.

amelius 2064 days ago

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

Personally I think it would be a good idea if you could e.g. configure your keyboard so that AltGr+L produces λ, which you can then use in place of \

But alas, the Haskell community has decided against this:

https://gitlab.haskell.org/ghc/ghc/-/issues/1102

sieste 2064 days ago

> The lambda syntax (\(x) -> x + 1) is similar to Haskell's (\x -> x + 1).

The proposed lambda syntax for R is `\(x) x+1` so `\` will just be shorthand for `function`.

roenxi 2064 days ago

The anonymous function change is probably a (small) mistake.

    function(x) {x + 1}

is already logically equivalent to and from some perspectives an arguable syntax improvement on

    \(x) x + 1

Giving everyone two ways of doing one thing just means the tutorials will be fragmented and beginners even more confused.

Tierney mentioned that tidyverse found function(x) too verbose and uses fomula syntax. Given how tidyverse often uses the "y ~ x" formula notation, this might actually be picking up deficiencies in R's macro system rather than in the function notation and the problem got misdagnosed.

kgwgk 2064 days ago

Having the option of writing

    \(x) x+1

instead of

    function(x) x+1

not only saves a few keystrokes.

It will also produce shorter, and clearer, lines of code.

What I don’t understand is the reference to “formula syntax”. What is the issue and how does the new syntax solve it?

roenxi 2064 days ago

Function syntax is this stuff [0, 1]. Tidyverse uses it to accomplish some non-model stuff. The one that leaps to my mind is faceting [2]. I'd expect that sort of thing to be handled by macros.

And all the rest of the comment I wouldn't have typed except I'm already replying, since I know this is one of those two-types-of-people-who-don't-change-opinions situations. But...

> not only saves a few keystrokes.

R is secretly a lisp. People can define whatever they want to be whatever they want. Pipes were already implemented in a library (try doing that in Python). Make your own library or bind \ to a keyboard macro or something if your fingers are on the point of crumbling under the stress of those 7 keystrokes.

Defaults using real words to describe things is good. The function to create a function being function() is eminently reasonable. \() is meaningless and about as useful as a one-word variable

> It will also produce shorter, and clearer, lines of code.

Opinons very much divide. Code length is only a proxy for load on a reader's short term memory which is what matters. \ is going to put more burden on someone if they aren't very familiar with R. Most R coders are not full time programmers and not very good at R.

[0] https://www.rdocumentation.org/packages/base/versions/3.6.2/... [1] https://www.rdocumentation.org/packages/stats/versions/3.6.2... [2] http://www.cookbook-r.com/Graphs/Facets_(ggplot2)/

kgwgk 2064 days ago

I know what formulas are, I just don’t see what’s the connection with the proposed change:

     ‘\(x) x + 1’ is parsed as ‘function(x) x + 1’

I also know that R has some vestigial scheme under the hood, but the syntax is not lisp (it was taken from S). In common lisp one could easily use a macro or reader macro but in R a change in the parser is needed so “\” can be used instead of “function”.

Note that the existing “f <- function(...) ...” syntax is not being removed. But I write a lot of code like

    state.range <- apply(state.x77, 2, function(x) c(min(x), median(x), max(x)))

and it will be an improvement to be able to condense it a bit to

    state.range <- apply(state.x77, 2, \(x) c(min(x), median(x), max(x)))

laretluval 2064 days ago

If you’re worried about giving programmers too many options, R is already a nightmarish lost cause...

c06n 2064 days ago

Thanks for clearing that up. I was wondering what I have been using all these years in lapply ...

submeta 2064 days ago

Beautiful. - Elixir has this as well. Love it. - In Mathematica you'd have to write `data // f // g` to denote `g(f(data))`.

## Edit

Corrected a notation.

nonfamous 2064 days ago

I don’t know Mathematica, but wouldn’t data // g // f make more sense for f(g(data)) ?

submeta 2064 days ago

You‘re right. That was I typo.

dnautics 2064 days ago

if you use vscode this is an invaluable vscode snippet:

https://slickb.it/bits/70

z3t4 2064 days ago

Its only a proposal in JS. For long function chains i like to use intermediate variables as they make the code easier to understand.

antipaul 2064 days ago

Genuinely curious why not combine syntax? Do we need 2 different pipes in R? When to use which? Thanks for your thoughts!

sieste 2064 days ago

wenc's comment (currently top) links to a video where luke tierney explains why the magrittr pipe is not optimal so they are looking for a native solution.