Hacker News new | ask | show | jobs
by infinite8s 3360 days ago
I've been prototyping simple desktop GUI tools on top of dask/pandas and PyQt that let you lazily load large CSVs (and other types supported by pandas) and interactively filter based on smart histograms (the per column histograms are fully interactive and provide crossfiltering across the attributes):

http://imgur.com/a/vfAmV

The idea is to map a lot of the basic functionality of dataframes onto simple GUI interactions (for example, changing column types, stacking and unstacking columns, pivoting) and couple that with an ipython console for more complicated data manipulation. And then maybe even adding adding Tableau like charting functionality:

http://imgur.com/a/z8d1w

For quick throwaway exploration/analysis. It can easily handle about a million rows just using generic pandas and a bit of memory. There's lots of cool database techniques that can also be used on small local data (for example, compressed bitmaps using EWAHBool for interactive filtering).

1 comments

Do you plan to release something soon? Even just a way to visualize the rows by loading them lazily would be a huge improvement. I personally use pandas but some of my colleagues are not familiar with it, and it pains me when they try to inspect a large dataset by opening it on Excel on our small university-provided desktops instead of spending a couple of minutes writing a few python lines to extract what they need.
It sounds like CSV Explorer might work well for them.