Hacker News new | ask | show | jobs
by TheTaytay 473 days ago
It’s funny you mention Modal. I use modal to do fan-out processing of large-ish datasets. Right now I store the transient data in duckdb on modal, using polars (and sometimes ibis) as my api of choice.

I did this, rather than use snowflake, because our custom python “user defined functions” that process the data are not deployable on snowflake out of the gate, and the ergonomics of shipping custom code to modal are great, so I’m willing to pay a bit more complexity to ship data to modal in exchange for these great dev ergonomics.

All of that is to say: what does it look like to have custom python code running on my polars cloud in a distributed fashion? Is that a solved problem?

1 comments

Yes, you can run

`pc.remote(my_udf, schema)`

Where

`def my_udf() -> DataFrame`

We link the appropiate Python version at cluster startup.