| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by REDS1736 904 days ago

I agree with you proposing this operator to simplify the specific (but not uncommon!) situation you describe. But i don't think, it exposes a problem to be solved but rather an antipattern to be avoided. Consider the following example in which mytransform is some function i want to apply to a part of the dataset:

  mydata <- read.csv('mydata.csv')
  mydata['score'] <- mytransform(mydata['score'])

This can be simplified with your proposed operator:

  mydata['score'] <|> mytransform()

An upgrade in elegance, i like it. But in R, a language which is commonly used switching between scripts and the REPL, with an IDE which by default captures your workspace so all your variables etc are restored the next time you resume the session, in this environment i feel like variables should be used as constants as much as possible. Mutating a variable (as in my example) creates room for confusion: Does mydata contain the raw csv data or the transformed data? Did i already evaluate this line in the REPL or did i not? What happens if i evaluate this line twice? Many people i know tend to "jump around" in their scripts, not following the written order of operations. This creates potentially irreproducible environments. My proposed solution is treating all variables (or at least as many as possible) like constants:

  mydata <- read.csv('mydata.csv')
  mydata_transformed <- transform(mydata)

Now i always know what a variable contains because each variable contains exactly the same value, independent of when i evaluate.

But nevertheless; i kinda went on a tangent here and strictly speaking, the problem i describe only arises from careless user behavior (which is quite prevalent in statistics though). Aside from this kind of behavior, i think this operator is an elegant idea!