| Polars author here. > Aren't these just if/else-if/else conditions? It's seems more in line with mathematical/python convention to use if/else... am I missing something? Yes, they are. But if you look at pandas `f['a'] <= 3` a boolean mask is created on eagerly, on the fly. Pandas has zero chance to do anything clever here. And yes, `when.then.otherwise` is exactly `if else`, but if `if else` is already a keyword in python so we cannot use them. `when, then, otherwise` are close synonyms. The benefit of using the `when().then().otherwise()` expression is that it is lazy. We don't do anything until we need to materialize the result. Then the optimizer has a chance to see the query a a whole and determine if the `mask` can be reused, is not needed, should be done somewhere else, etc. > Polars mutates the dataframe, Almost all polars methods are pure. There will be no dataframe mutated, but a new dataframe created. > Is there a reason why Polars is trying to avoid this kind of filtering in the row/column indices. Yes there is. Ambiguity. I want things to be explicit. So the method names should make clear that you are selecting rows: `df.filter` or selecting columns: `df.select` or slicing `df.slice` In pandas this can all be done with bracket notation. I often read code something like this `df[foo] = bar` and wondered what kind of datatype was stored into `foo`. Indexes has the same read complexity. I often read/saw queries that showed a different outcome after a `reset_index` call. I like things to be more explicit. This may cost some keystrokes, but future me/us can more easily understand what is going on. |
Isn't this just an implementation detail? It seems like it wouldn't be tough to turn this into syntactic sugar rather than a forced eager evaluation. IE, `f['a'] <= 3` could just as easily evaluate into a computation graph rather than the evaluation of that graph. For example, I could imagine something like so:
```
from polars.dataframe import LazyDataFrame, DataFrame
def fn():
```This is a toy example so I'm not sure if the part around evaluation makes complete sense, but it seems like how pandas eagerly evaluates the frame is a shortcoming of its implementation and model, rather than the syntactic sugar itself.
To be even more specific, this is the way SQLAlchemy does it. You could have something like this:
```
from models import Contact
def fn():
```And SQLAlchemy knows not to actually trigger the evaluation until you do something like `.all()`. Why not adopt this kind of pattern with Polars?