Hacker News new | ask | show | jobs
by gravypod 3295 days ago
> The inability to use Pythonisms with Pandas is insane and I had to do a data analysis where I really, genuinely needed to do some looping and some simple map/reduce and it almost drove me insane.

I recently started a project that I got to write from the ground up by myself. I was happy with the processing side of things. I was very sad with the data I was getting in and putting out. There's some impedance mismatch that doesn't need to exist.

> You might like [Agate](http://agate.readthedocs.io/) better.

I looked at the front page and definitely wasn't enjoying what I was seeing. It, at first, looked like more complexity piled up on top of things that don't need it. Then I saw this link: http://agate.readthedocs.io/en/1.6.0/cookbook/compute.html#l...

This is definitely worth a try. Much closer to what I was thinking.

> I don't mind that matplotlib is kind of awful -- that data viz would never go in a published piece in any event. I just want some hints as to what I or more likely a teammate would build in D3 around the specifics of the data.

Sadly in my field matplotlib is the professional tool (hah!). The end goal is the matplotlib plots. I'd be all fine for tweaking things in a designing program and putting it up by I'd be upset with myself.

My end goal is to have a single script in a repository that installs, runs, and then compiles my papers. I don't want anyone to need to look at sub-standard copies of my plots. I want anyone to be able to jump in and check my work and create derivative works.

Sadly this is not common in science today so there aren't really good tools for this sort of thing at the composition side. Even worse plotting isn't common in the computer world so tools for that don't exist either.

1 comments

> I recently started a project that I got to write from the ground up by myself. I was happy with the processing side of things. I was very sad with the data I was getting in and putting out. There's some impedance mismatch that doesn't need to exist.

Impedance mismatch is a great way to put it. For me, if I can deal with that mismatch so that newbies/journalism colleagues don't have to, I'll do it.

> Sadly in my field matplotlib is the professional tool (hah!). The end goal is the matplotlib plots. I'd be all fine for tweaking things in a designing program and putting it up by I'd be upset with myself.

I used to work in science and have found journalism to have better solved many of these issues (at the expense, of course, of specialization and depth -- even a yearlong project isn't quite the same as decades of experience working in a single area). The solutions aren't pure or pretty -- they're more about workflow and held together with duct tape and baling wire. But the competitive pressure to deliver data that has a good user experience on deadline is very powerful and has led to some effective practices.