Hacker News new | ask | show | jobs
by ryanmonroe 1611 days ago
I hadn't thought of that tradeoff. After testing just now, if you have a column named `.data` or `.env` those constructs work as if there was no such column, and actually in that case `mutate(df, b = .data + 1)` is an error.

Personally I'll happily take not being able to use those as column names if it means I can avoid always typing : before every in-data variable, but your comment gave me a better understanding of why it would be bad for some other person or scenario, perhaps where short term ease-of-use is lower on the list of priorities.

For your second example, it doesn't come up in R because a data frame column cannot be a function. Columns must be vectors (including lists) and you could have a vector where one or all elements are functions, but the column itself cannot not be a function (functions are not vectors), so there's no ambiguity there. To call a function stored in your data frame you'd have to access an element of the column, and any access method, e.g. `[[` or `$` would make the resulting set of characters invalid as the name of an object (without backticks, which would then disambiguate the intent)

    df <- tibble(x = list(function(x) x + 1))
    df %>% 
      mutate(y = x[[1]](3))
Separate from dplyr, in R when you use `(` to call a function it searches only for functions by that name.

    log <- 3
    log(1)
    # 0

    frog <- 3
    frog(3)
    # Error in frog(3) : could not find function "frog"
    
    log <- function(x) x^2
    log(1)
    # 1
1 comments

In Julia you could have an `AbstractVector` type also be callable, or more likely a vector of callable objects (and the operation is performed row-wise).

I agree it's unlikely that a user will name their column `.data`. But it certainly saves developer effort from thinking about these issues.

The larger concern, really, is that Julia needs to know which things are columns and which things are variables in an expression at parse time in order to generate fast code for a DataFrame. It needs to do this without inspecting the data frame, since the data frame's contents aren't known at parse time.

One option would be to make all literals columns. But then you run into issues with things like `missing`, which would have to be escaped or not recognized as a column. Its hard to predict all the problems there, and any escaping rules would definitely have to be more complicated than R's. So we require `:` and take the easy way out, which has the added benefit for new users who might get confused about the variable-column distinction.