Hacker News new | ask | show | jobs
by paddy_m 611 days ago
I have a tool [1] that is tackling some of the same problems in a different way.

I had some core views that shaped what I built.

1. When doing data manipulation, especially initial exploration and cleaning, we type the same things over and over. Being proficient with pandas involves a lot of recognition of patterns, and hopefully remembering one with well written code (like you would read in Effective Pandas).

2. pandas/polars is a huge surface space in terms of API calls, but rarely are all of those calls relevant. There are distinct operations you would want on a datetime column, a string column or an int column. The traditional IDE paraidgm is a bit lacking for this type of use (python typing doesn't seem to utilize the dtype of a column, so you see 400 methods for every column).

3.It is less important for a tool to have the right answer out of the box, vs letting you cycle through different views and transforms quickly.

------

I built a low code UI for Buckaroo that has a DSL (JSON Lisp) that mostly specifies transform, column name, and other arguments. These operations are then applied to a dataframe, and separately the python code is generated from templates for each command.

I also have a facility for auto-cleaning that heuristically inspects columns and outputs the same operations. So if a column has 95% numbers and 1% blank strings, that should probably be treated as a numeric column. These operations are then visible in the lowcode UI. Multiple cleaning methods can be tried out (with different thresholds).

[1] https://github.com/paddymul/buckaroo

[2] https://youtu.be/GPl6_9n31NE?si=YNZkpDBvov1lUYe4&t=603 Demonstrating the low code UI and autocleaning in about 3 minutes

[3] There are other related tools in this space, specifically visidata and dtale. They take different approaches which are worth learning from.

ps: I love this product space and I'm eager to talk to anyone building products in this area.

1 comments

this is really really cool! directly working with table is sometimes the only way to clean the data as well :)

I wish multiple ways of interacting with data can co-exist seamlessly in some sort of future tool (without overwhelming users (?)) :)

To your point about LLM based approaches have the huge adoption advantage in that you don't need to understand a lot to write into a text box.

A tool like buckaroo requires investment into knowing where to click and how to understand the output intitially.