Hacker News new | ask | show | jobs
by spangry 3350 days ago
Usually just simple data analysis, really nothing far outside of the 'statistics' lib. Currently it's more the exploration and discovery part of the exercise I'm struggling with. I've got a few hundred thousand csv files representing various aspects of Australia's national energy market (e.g. outcomes of 5 minute supply auctions). I'm trying to make my way through that, figure out what's relevant and wrangle the relevant stuff in some organised fashion.

Is pandas the wrong kind of tool for this type of thing? Going off what rdorgueil has said, I'm beginning to suspect so. Is there a data-wrangling 'gold standard' library for python?

3 comments

I'm just learning pandas as well but I think it is the right tool for the job. I am using django-pandas so I can do easy ORM stuff. If I were to sketch out your use case:

Create a object/class called

    AuctionResult
     - some datetime
     - value
Then you'd query it qs = AuctionResult.objects.all()

then you load it into a pandas dataframe:

df = read_frame(qs)

After that you can do all sorts of the fun stuff I imagine.

I don't see why pandas won't work for your case. It sounds like most if not all the csvs contain the same columns and type of data. You could easily create a pandas dataframe that combines them all, then use any plotting library like matplotlib and/or seaborn to plot. If you need help provide some examples of the csvs you are trying to parse.
Pandas is definitely the most popular (and imo best) data wrangling library for Python.