| I have a tool [1] that is tackling some of the same problems in a different way. I had some core views that shaped what I built. 1. When doing data manipulation, especially initial exploration and cleaning, we type the same things over and over. Being proficient with pandas involves a lot of recognition of patterns, and hopefully remembering one with well written code (like you would read in Effective Pandas). 2. pandas/polars is a huge surface space in terms of API calls, but rarely are all of those calls relevant. There are distinct operations you would want on a datetime column, a string column or an int column. The traditional IDE paraidgm is a bit lacking for this type of use (python typing doesn't seem to utilize the dtype of a column, so you see 400 methods for every column). 3.It is less important for a tool to have the right answer out of the box, vs letting you cycle through different views and transforms quickly. ------ I built a low code UI for Buckaroo that has a DSL (JSON Lisp) that mostly specifies transform, column name, and other arguments. These operations are then applied to a dataframe, and separately the python code is generated from templates for each command. I also have a facility for auto-cleaning that heuristically inspects columns and outputs the same operations. So if a column has 95% numbers and 1% blank strings, that should probably be treated as a numeric column. These operations are then visible in the lowcode UI. Multiple cleaning methods can be tried out (with different thresholds). [1] https://github.com/paddymul/buckaroo [2] https://youtu.be/GPl6_9n31NE?si=YNZkpDBvov1lUYe4&t=603 Demonstrating the low code UI and autocleaning in about 3 minutes [3] There are other related tools in this space, specifically visidata and dtale. They take different approaches which are worth learning from. ps: I love this product space and I'm eager to talk to anyone building products in this area. |
I wish multiple ways of interacting with data can co-exist seamlessly in some sort of future tool (without overwhelming users (?)) :)