Hacker News new | ask | show | jobs
by kortex 3006 days ago
I spent a half day playing around with something very similar to this. I wanted a concise language for describing data pipelines in Pandas, and was (ab)using python dunder methods (operator overloading) to this end. Like:

`data | groupby, "author" | mean`

Would create a graph object, which could be lazily evaluated, run in Dask, TF, etc.

It started to get ugly when passing in multiple parameters into a function. I had to watch out for left and right associativity, and manage casting of arguments.

It was a fun little experiment but I'm not sure how much it would actually improve workflows. If that sounds interesting, let me know and I'll poke at it again.

1 comments

I'm not sure if you are aware, but there are several efforts out there to give Python a more data-pipeline-friendly (composable pipe) syntax:

1) Coconut: http://coconut-lang.org/

2) https://github.com/JulienPalard/Pipe

3) Pandas also has a new dataframe pipe method. https://pandas.pydata.org/pandas-docs/stable/generated/panda...

I would look at those before rolling out a custom solution.