Hacker News new | ask | show | jobs
by TedPetrou 2204 days ago
Hey everyone, I'm the author of Dexplot. I have written large sections of books on seaborn, have taught it in classes for years, and had many issues with it, some of which are outlined below:

• Not allowed to set figure size

• No wrapping of tick labels

• No strings for pandas aggregation functions

• No automatic ordering of x/y labels (dexplot provides several options)

• Having to use separate grid functions (catplot, lmplot) for multiple subplots

• Something like 5 different functions for scatterplots. Dexplot has one

• No relative frequency bar charts, which are a fantastic way to explore data. Dexplot provides normalization over any set of variables

• No stacked bar charts

• Seaborn docs have distribution plots (box, violin) in the "categorical" section. A major distinction needs to be made between plots that aggregate, those that show distributions, and those that plot raw data (like scatterplots)

• Returning of matplotlib axes or seaborn grid objects. Dexplot always returns the matplotlib figure

• Seaborn is essentially dead as far as I can tell with few changes in the last 2-3 years. There are even parameters that continue to be non-functional

In the future, Dexplot will add:

• Many more plotting functions

• Several apps (built from ipywidgets) to explore data. Currently, there is one for viewing colors

• Better automatic figure sizing (it exists now, but will be improved)

• Automatic DPI detection so that matplotlib inches correspond to actual screen inches

Dexplot aims to be very intuitive, easy to use, consistent, and allow easy exploration (the name is a smashing together of data exploration plotting).

Here is one example comparison between dexplot and seaborn. https://twitter.com/TedPetrou/status/1271436948721328129

Examples such as these are what drove me to create the library.

I'd love to get feedback and happy to take detailed criticism.

4 comments

> Seaborn is essentially dead as far as I can tell with few changes in the last 2-3 years. There are even parameters that continue to be non-functional

https://seaborn.pydata.org/whatsnew.html

Seaborn has received a couple updates this year. Not sure what you mean by can't control figure size either. The ways to do so are inconsistent, but they're there.

> I'd love to get feedback and happy to take detailed criticism.

I like your syntax a lot. This page isn't a good way to show it. Seaborn's gallery page is excellent, even if redundant at times. I would dedicate more time to creating more easily useable docs like that. Docs are almost everything when it comes to charting.

Also need to see stuff on how to control aesthetic things like color, outlines, style, etc.

Technically, there was a new "major" release with version 0.10, but it was just some bug fixes and the same as 0.9.1. The last release with anything new was in July of 2018. Given the rate of the last several releases, I don't expect much to happen for a while, thus "essentially dead".

You cannot control axes plot figure size from seaborn directly. You have to access the figure from the axes (which most people don't know how to do) or create the figure first by importing matplotlib. Really annoying for those that just want to analyze data quickly. Grid plots have the ability to adjust figure size, but return a seaborn object and not a matplotlib figure.

Agreed, docs need to get better. Better datasets, a gallery, etc... I've only spent a week on this, so there will be a lot of improvements in the future.

Hey for what it's worth I think this is really impressive for only a week!
Good job, this looks extremely expressive and with fewer corner cases ("dark knowledge") than MPL/SNS, and unlike Altair/Plotly doesn't require a whole browser to display the output!

Still, I'd like to ask if you considered alternatives to MPL for the back-end. It's a venerable but ancient project with years of accumulated technical debt, and I'm sure you had to deal with lots of inconsistencies there.

For example, PyQtGraph is an alternative with a clear class hierarchy and can handle large-scale datasets without slowing down (while anything non-trivial in MPL has you wait seconds to render).

(I'd love to hear more suggestions that don't require a JS engine and don't build on MPL.)

Thanks! I'm focused on building the user-facing API, as this is where I believe I'm best suited to make improvements due to my experience teaching and writing.

I'm definitely open to looking at alternative backends in the future and will check out PyQtGraph, but am sticking to matplotlib for now.

Given this, would you say your goal with Dexplot is to be a better Seaborn (and replace it given its dev state), basically the same usage cases but with improvements as you describe?

Thanks for the effort. Looks like a great project.

Correct, I'd like dexplot to be a superset of seaborn first, making it much easier to use for those that don't want to dip into matplotlib for making minor adjustments that are necessary for most plots (figsize, ticklabels, etc..).

There should be a library to do exploratory data analysis quickly, without having to touch matplotlib, numpy, or pandas, and without installing something like pandas-profiling to make reports.

This is where the apps will come in to allow users to quickly generate reports on things like missing values, duplicate rows/columns, outliers/bad data, view different colors, etc...

To be honest I thought the fig on the right was your proposal. You better improve the default. Plus when you compare the data should be the same.
The data is the same. Dexplot automatically sorts the xtick labels alphabetically. Seaborn uses order of appearance. For the seaborn plot, the figure size and dpi have to be manually adjusted and there is no option to wrap the tick labels. They are a mess and overlap one another. The tick label wrapping is a huge win imo, otherwise you have to rotate them, which makes long labels look terrible.