Hacker News new | ask | show | jobs
by ritchie46 1167 days ago
This is far more elegant in pandas due to the implicit behavior of the index.

But you can move the explicitness of polars behind a function. A more explicit API should not hurt maintainability if we structure our code right.

1 comments

So something like this?

    def add(df1, df2, meta_cols, val_cols=None):
        # join on meta cols
        # add val cols (default to all non meta cols if None)
        # return df with all meta and val cols selected
In theory I think that's fine. The problem is that in practice this will cause a lot of visual noise in your models, since for every operation you would need to specify, at least, your meta columns, and potentially value columns too. If you change the dimensionality of your data, you would need to update everywhere you've specified them. You could get around this a bit by defining the meta columns in a constant, but that's really only maintainable at a global module level. Once you start passing dfs around, you'll have to pass the specified columns as packaged data around with the df as well. There's also the problem that you'd need to use functions instead of standard operators.

One thing that would be nice to do is set an (and forgive me, I understand the aversion to the word "index") index on the polars dataframe. Not a real index, just a list of columns that are specified as "metadata columns". This wouldn't actually affect any internal state of the data, but what it would do is affect the path of certain operations. Like if an "index" is set, then `+` does the join from above, rather than the current standard `+` operation.

In any case I realize this is a major philosophical divergence from the polars way of thinking, so more just shooting shit than offering real suggestions.