Hacker News new | ask | show | jobs
by ms013 2770 days ago
Yup. A likely result is that if you pick one and spend the time to learn it and use it for a project, there's a non-trivial chance that the choice you make will be join the ever growing collection of library abandon-ware in the not too distant future.

This is why my favorite Python visualization tools are not Python - I've been burned too many times by libraries coming and going, and I just don't have the time to spend farting around trying to track the latest library fads.

5 comments

Argh, it's so frustrating to see this kind of sentiment. We're truly spoiled by choice. The libraries don't all serve the same purpose, and not everyone needs every library. This is literally a guide to help pick which one I may want. What more could one ask for??

That said, I understand it sucks to build on top of someone else's code only to have it be discontinued, but honestly it only takes a few minutes to judge a project's maturity. Here are some very easy rules of thumb. Don't write a lot of important code relying on a library that:

- Is younger than 3 years old. - Has less than 3-4 major contributors and 20 overall contributors. - Has lost steam: a lot of issues and pull requests open and stale with no triage tags, no discussion or responses. - Doesn't seem to have any automated testing / packaging infrastructure set up. - Doesn't seem to have a regular ongoing cycle of releases, be it long or short. - Doesn't have nicely laid out documentation. - Doesn't seem to have a user base, as indicated by a preponderance of questions and answers on stackoverflow, etc. - Just posted their "we made a cool new thing" post on HN a few weeks ago.

Yes, the cutoffs are arbitrary (and flexible!), but this hasn't failed me yet. The python world is filled with many wonderful and mature libraries. It just also has a lot of up and coming, promising young ones. Use whichever!

> What more could one ask for??

A refactoring and merging of these libraries to enable fewer people to maintain more functionality, such that the bus-factor of any given part of the functionality is higher.

That's entirely backwards, to reduce the bus factor, you need more people maintaining less, not fewer people maintaining more.

In any case, the incentives are simply not there, as different projects have very different priorities (which is why there are so many projects). Some folks want to monetize their special sauce (Plotly). Some folks want to focus on high level statistical charting (Altair, Chartify) Some folks want to focus on interactive data exploration (Holoviews) Some folks want to focus on high performance and streaming (Bokeh) Some folks want to focus on high quality static image generation (MPL). The human and economical cost of getting all those groups under one tent is astronomical.

> you need more people maintaining less, not fewer people maintaining more

I see how you got confused, but I was making two separate assertions.

Assertion 1: If you merge the library projects together, and strip out the redundancies between them, then you'll have the same number of contributors, now distributed over fewer total lines of code. So each contributor can learn more of the codebase. (Bus factor goes up.)

Assertion 2: If you refactor the resulting libraries to reduce the total complexity (i.e. reduce the API surface from that of the union of all the merged-in libs), then you can begin to strip out technical debt from the project from the outside in. By eliminating now-dead code, you remove places where bugs can arise, and you lower the number of dependencies (which could otherwise have been sources of API-breaking changes when they update.) Thus, the number of people needed to maintain the project goes down. So the same total functionality can then be maintained by fewer contributors. (You don't actually remove contributors; they just become reserve capacity, with each contributor able to be less-overworked for the same result.)

> If you merge the library projects together

Alas, it's not so simple. How much actual overlap is there to merge in Bokeh and Matplotlib, for instance? MPL renders images in Python and has no JS component. Bokeh does all of its rendering in JavaScript! The Python API is mostly just a thin wrapper around BokehJS. MPL has no server component at all. Neither of the does what Datashader does for large data sets. Neither of them has a high level statistical charting API, that's only in Seaborn or Chartify. Merging all these things together would cost a fortune in time and money, and at the end of the day not actually reduce the total codesize to any appreciable degree.

Then again, if you learn one well it's not too difficult to learn another one as well. And once you've worked with a couple different libraries, you start to understand the different data models and it's even easier to adapt to the next library.

Also, for many of the visualizations I've worked with I'm fine to use an older library that's reached a stable point of development. It feels like the whole conversation about whether to build your new project on the latest web framework, or use something old and established like Django or Rails. Boring, relatively stable software in any area is a pleasure to work with if you really want to focus on your real-world problem more than you want to focus on newer tech.

I don’t really get this. I’ve used Matplotlib for the last ten years, and it just gets better.
From an ease-of-use perspective, it certainly had the room to improve.
Matplotlib is often a dependency for Python dataviz libraries.

I often just skip the middlemen and use it directly.

Matplotlib isn't going away any time soon.

MPL is great, but its pretty much only a dependency of Seaborn. It's not a dependency of Bokeh, Altair, or Plotly
there's a non-trivial chance that the choice you make will be join the ever growing collection of library abandon-ware in the not too distant future.

I’ve been using Matplotlib for the last 10 years and expect to continue using it for the next 20. The stability is there, you just need to resist the urge to keep switching. It’s like vi: whatever your “favourite” editor is, you know that vi will always be there.

Of course as others have said ggplot2 is better but I am going to check out plotnine now...