Hacker News new | ask | show | jobs
by maxerickson 3350 days ago
What have you wanted to use it for?

Pandas is basically an R data frame for Python. A sloppy description of that is a text mode spreadsheet.

The description of Bonobo doesn't immediately invite the comparison to Pandas, to me anyway.

2 comments

You're very right, as I'm using both pandas and bonobo for different reasons.

Mostly, when I want a quasi-mathematical look over a dataset, pandas is my tool of choice. For all those data pipeline things that reasonably fit on one computer, I do use bonobo.

I'm an avid Pandas user. Some stuff at work has come up recently that calls for ETL - and I'm trying to figure out what the best tools are. Is any of your Bonobo code public? I'd be curious to see what a real-life project looks like...
No, I don't have real-life public code available. I'm gonna see what I can extract from old commercial project for publication, but I can't guarantee anything.
Luigi is a simple tool from Spotify that seems to solve a similar data workflow (with DAGs) problem as Bonobo. Airflow from AirBnB is a more complex tool and I've understood that Spotify has lately moved from Luigi to Airflow.

Can anybody comment how Bonobo compares to Luigi?

Usually just simple data analysis, really nothing far outside of the 'statistics' lib. Currently it's more the exploration and discovery part of the exercise I'm struggling with. I've got a few hundred thousand csv files representing various aspects of Australia's national energy market (e.g. outcomes of 5 minute supply auctions). I'm trying to make my way through that, figure out what's relevant and wrangle the relevant stuff in some organised fashion.

Is pandas the wrong kind of tool for this type of thing? Going off what rdorgueil has said, I'm beginning to suspect so. Is there a data-wrangling 'gold standard' library for python?

I'm just learning pandas as well but I think it is the right tool for the job. I am using django-pandas so I can do easy ORM stuff. If I were to sketch out your use case:

Create a object/class called

    AuctionResult
     - some datetime
     - value
Then you'd query it qs = AuctionResult.objects.all()

then you load it into a pandas dataframe:

df = read_frame(qs)

After that you can do all sorts of the fun stuff I imagine.

I don't see why pandas won't work for your case. It sounds like most if not all the csvs contain the same columns and type of data. You could easily create a pandas dataframe that combines them all, then use any plotting library like matplotlib and/or seaborn to plot. If you need help provide some examples of the csvs you are trying to parse.
Pandas is definitely the most popular (and imo best) data wrangling library for Python.