Hacker News new | ask | show | jobs
by yowlingcat 1260 days ago
> Yes, they are. But if you look at pandas `f['a'] <= 3` a boolean mask is created on eagerly, on the fly. Pandas has zero chance to do anything clever here.

Isn't this just an implementation detail? It seems like it wouldn't be tough to turn this into syntactic sugar rather than a forced eager evaluation. IE, `f['a'] <= 3` could just as easily evaluate into a computation graph rather than the evaluation of that graph. For example, I could imagine something like so:

```

from polars.dataframe import LazyDataFrame, DataFrame

def fn():

  ...

  ldf = LazyDataFrame(df)
  # this mutates the computation graph but doesn't evaluate
  ldf.loc[f['a'] <= 3, "b"] = f['b']
  df = DataFrame(ldf)
  return df
```

This is a toy example so I'm not sure if the part around evaluation makes complete sense, but it seems like how pandas eagerly evaluates the frame is a shortcoming of its implementation and model, rather than the syntactic sugar itself.

To be even more specific, this is the way SQLAlchemy does it. You could have something like this:

```

from models import Contact

def fn():

  ...

  # doesn't evaluate; could trivially be done as Contact[Contact.name == 'John']
  filtered_contact_exp = Contact.filter(Contact.name == 'John')
  # actually evaluates
  filtered_contacts = filtered_contact_exp.all()
  return filtered_contacts
```

And SQLAlchemy knows not to actually trigger the evaluation until you do something like `.all()`. Why not adopt this kind of pattern with Polars?