Polars is also usable as a Rust library. So, one can use that for static typing. Wonder what the downsides are - maybe losing access to the Python data science libraries.
Polars dataframes in Rust are still dynamically typed. For example:
let df = df![
"name" => ["Alice", "Bob", "Charlie"],
"age" => [25, 30, 35]
]?;
let ages = df.column(“age”)?;
There’s no Rust type-level knowledge of what type the “age” or “name” column is, for example. The result of df.column is a Series, which has to be cast to a Rust type based on the developer’s knowledge of what the column is expected to contain.
You can do things like this:
let oldies = df.filter(&df.column("age")?.gt(30)?)?;
So the casting can be automatic, but this will fail at runtime if the age column doesn’t contain numeric values.
One type-related feature that Polars does have is because the contents of a Series is represented as a Rust value, all values in a series must have the same type. This is a constraint compared to traditional dataframes, but it provides a performance benefit when processing large series. You can cast an entire Series to a typed Rust value efficiently, and then operate on the result in a typed fashion.
But as you said, you can’t use Python libraries directly with Polars dataframes. You’d need conversion and foreign function interfaces. If you need that, you’d probably be better off just using Python.
Pandas, dask, etc use also have runtime typed cols (dtypes), which is even stronger in pandas 2 and when used with arrow to go to data representation typing for interop/io. (Half of the performance trick of polars.)
And yeah my ??? with all these is, lacking dependent typing or equivalent for row types, it's hard for mypy and friends to statically track individual columns existing and being specific types. And even if we are willing to be explicit about wrapping each DF with a manual definition, basically an arrow schema, I don't think any of these libraries make that convenient? (And is that natively supported by any?)
In louie.ai, we generate python for users, so we can have it generate the types as well... But we haven't found a satisfactory library for that so far...
You can do things like this:
So the casting can be automatic, but this will fail at runtime if the age column doesn’t contain numeric values.One type-related feature that Polars does have is because the contents of a Series is represented as a Rust value, all values in a series must have the same type. This is a constraint compared to traditional dataframes, but it provides a performance benefit when processing large series. You can cast an entire Series to a typed Rust value efficiently, and then operate on the result in a typed fashion.
But as you said, you can’t use Python libraries directly with Polars dataframes. You’d need conversion and foreign function interfaces. If you need that, you’d probably be better off just using Python.