Hacker News new | ask | show | jobs
by bash-j 1998 days ago
This has frustrated me as a python user for the last 7 years, working as the only python user in business environments dominated by Excel. People will say things like, "if you leave, who can support this report you made in python?" Well I say, who can support the bloated 40mb spreadsheet that would take forever to unpick and figure out how to update with new data? No one can, because I've seen people would rather rebuild their own spreadsheet from scratch, than use the files they inherited from the last person.

If these tools are necessary to conduct business and they are so worried about being able to support it, why don't they use proper software for that process?

A lot of people who make these bloated spreadsheets are people with no education in computing, and don't think about the basics of how to store data that is easy to analyse later. If they are building a weekly report, they build the report and enter the data directly into the report structure, which then makes it almost impossible to analyse later. Next week they just copy the file, rename it and update the data. If you want to analyse that same data over a year, good luck! You can't even count on the data being in the same place over the 52 weeks, since they would have added and removed data points over time.

Once I got the process down in a jupyter notebook, handling all the oddities with the data coming from whichever website, CSV file, data warehouse report I need, I can just save it as a .py file and run it as a scheduled task on a virtual computer forever. The data is kept in a format that can be appended to with each update, and can be easily analysed later.

The most amazing thing with replacing excel with python is you don't need to manually perform the update process yourself. Which means it doesn't cost anything to run the process more often. Weekly reports can become daily, or even hourly email updates that are only sent when something interesting happens. People can start reacting to things shortly after they happen, rather than having to remember what happened a week or a month ago. The iteration on improving becomes so much faster. People spend more of their time discussing how to fix problems, rather than spending time building problem finders. You can even start to automate the fixing of the problem in python and people don't even have to spend time on that thing at all, ever again.

2 comments

> You can't even count on the data being in the same place over the 52 weeks, since they would have added and removed data points over time.

> I can just save it as a .py file and run it as a scheduled task on a virtual computer forever.

This is rather naive and short-sighted. Do you think the spreadsheet guy is moving data points around because s/he's bored at work and is screwing around with no purpose? No, the business requirements change, so he needs to update the spreadsheet to incorporate the new rules and/or data.

Which is exactly what you'll need to do with your python program, otherwise it also will break and/or produce incorrect results.

Simple example: calculate available vacation days. Last year company policy was simple, use it or lose it. Just subtract days allotted minus days used in the calendar year. This year company policy allows for up to 5 to be rolled over. Now we also need to know how many were available last year, how many were used, how many could be rolled over. Your Python program importing from SQL query, CSV, data warehouse report... totally breaks now that the data source has 5 columns instead of 2.

Claiming you can build a program in Python or any other language and run it "forever," in the context of a business, makes the whole comment lose any credibility.

It sounds like you missed the point of GP's post.

He was talking about avoiding manual weekly data copy-paste errors by writing code to do it in a predictable format.

I think you assumed that they meant the code would never have to be changed again, when they were actually talking about being able to standardize the data update process.

I don't think they said anything about never having to change the code, just that running a software process saves each user from manually pasting data every week.

I'll stick with bash-j's own words, where he bragged about running a python script "forever", and not your interpretation of his words. I'm also going to dismiss what you "think [I] assumed that they meant"

However, if you, or bash-j, believe Excel is incapable of avoiding manual data updates, that's simply not correct. Excel provides at least 3 ways of bypassing manual data entry in favor of automated imports from an external data source -- VBA code, ODC (not to be confused with ODBC) data connection definitions, and PowerQuery (I think that is the current name, haven't used it in a long time).

I will grant that tracking vacation days in a 5 person company might be a valid use of excel.
I've used excel and python in lots of business contexts. For most tasks involving domain experts, excel usually wins hands down.

An excel spreadsheet is usually easily auditable. The visual presentation and layout lends itself to review by others. You can click and point at values. Python and other programming languages require an environment and tooling that can't be easily supported across the enterprise. It requires source control systems and code review.

Programming languages are also too "dynamic". Using excel I can bring in a hard-coded report and link to those values in another tab. In python I'll have to save those to another file or re-query the data source, which may have changed due to new values being retrospectively added.

Python is the right tool for lots of analysis tasks, but for most corporate reports it's hard to beat excel. Programmers are also more expensive than corporate analysts. So you would end up replacing teams of low-cost high-retention analysts with high-cost low-retention programmers.

how does Python require source control? I guess you can just drop your files on a sharedrive... just like ... EXCEL!

Also you reference an existing report? What's exactly the problem of serializing your data after a run of your python program? Actually, if you think this through, you would probably establish some pipeline architecture to just continously integrate your results.

> An excel spreadsheet is usually easily auditable. The visual presentation and layout lends itself to review by others.

Tell that the next poor soul who has to edit some organically grown spreadsheets powered by VBA and malformed CSV files which generate your company's financial reports.

> An excel spreadsheet is usually easily auditable.

What does this sentence mean? Virtually no part of excel is easily auditable, at best you can see "this file was changed by xyz at a:b:c" identifying the cells that have changed between versions wouldn't even be easy.

All versions of Excel from 2013 and onward come with a tool that lists out cell by cell differences between two spreadsheets, called Spreadsheet Compare [0].

It lists side-by-side differences in hardcoded values, formula changes, calculated value changes, and even changes in VBA code. The list of changes can then be dumped out to a text file.

Default Excel installations also include the Inquire add-in which allows you to perform the comparison within Excel itself.

[0] https://support.microsoft.com/en-us/office/compare-two-versi...