Hacker News new | ask | show | jobs
by packetlost 1390 days ago
> You know, in data science, you see people spending hours writing pandas scripts that replicate a few clicks in excel for a one of analysis

I mean, having an Excel doc at all usually implies hour(s) of work formatting the data in structured manner. Sometimes collective decades of work depending on how much heavy lifting your 15GB .xlsx is doing.

2 comments

This is why I've adopted R and Python for the data work I do. I have a bunch of exported data (CSV files) that I use. Manipulating the structure and format is 90% of the work. I wrote the scripts once, now I can reuse that for everything instead of playing games getting those CSV files (dates in particular) to play nicely.

Even a one off analysis is actually FASTER in Pandas because I've done the work of farting around with the formatting. Now I can just write the necessary analysis code, rather than deal with the formatting.

That said, my data analytics work is seriously small potatoes compared to many. But I can write a quick pivot table using Dplyr faster than I can do it in Excel.

Often that work exists regardless of if a table of processed data that engineering formatted and schema-fied is dumped out to Excel or queried over SQL into Pandas...

I've seen this myself: the person who "naively" downloads that table and plays around in excel finds interesting things that the person who was using Pandas hadn't, because the code to manipulate columns and do certain types of calcs is actually more time consuming to write and modify than making a bunch of new columns in Excel with a bunch of formulas!

A good data scientist will have a more rigorous approach to their notebooks and practice reuse and so on... but that's not necesssarily easy.

> the person who "naively" downloads that table and plays around in excel finds interesting things that the person who was using Pandas hadn't,...

I think they call that serendipity. Never underestimate its power.

https://didgets.substack.com/p/data-science-and-serendipity