Hacker News new | ask | show | jobs
by ramblenode 3589 days ago
MS Excel is absolutely unfit for most scientific and engineering problems.

The spreadsheet GUI, lack of good version tracking/history, and eagerness to coerce data types and "correct" values makes it easy to introduce errors that will go unrecognized and propagated through calculations. Unfortunately this story just keeps repeating itself.

But all of this is just a secondary concern to Excel's real trouble: it's history of incorrectly implementing numerical and statistical procedures. One could plumb the depths of this topic for hours, but here are a few highlights: regression formula accepts illegal/nonsensical inputs (e.g. collinear predictors) and gives illegal/nonsensical outputs [0], variance/standard deviation change incorrectly with sample size [0], output of a paired t-test changes when missing values are included [0], formulas are mislabeled [0], v. 2007 gives very wrong answers to 11 of 27 tests in the NIST test suite used for statistical software benchmarks [1], the random number generator was broken as late as v. 2007 [1], and calculations relying on any of 12 particular floats display an incorrect result [2]. There are plenty of other issues mentioned in the links and elsewhere; if you're interested you'll have no trouble finding them.

Remember, friends don't let friends use Excel for science. :)

[0] http://people.stern.nyu.edu/jsimonof/classes/1305/pdf/excelr...

[1] http://www.pages.drexel.edu/~bdm25/excel2007.pdf

[2] https://blogs.office.com/2007/09/25/calculation-issue-update...

Edit: clarify and add a new issue I became aware of while researching further.

7 comments

I'll just quote Soustrup here: "There are only two kinds of languages: the ones people complain about and the ones nobody uses." In particular you see the spreadsheet GUI as a downfall, but I think it is a great enabler that has allowed millions of non-programmers to create useful programs.
I would counter with "use the right tool for the job."

Excel is great at churning out fast and dirty estimates for low impact work. The problem is when it's used for large, complex, or important problems because these are just not Excel's domain--something obvious when looking at the kinds of features and bug fixes MS has prioritized over the years.

I think the Excel team disagrees with you. I remember having even seen an Excel demo about big data on channel9 (apparently if it fits in a spreadsheet, it is still big data!).

Based on the evolution of Excel, they clearly see it as a data analysis tool only, with lots of iterations of the pivot tables functionality. And clearly not as a modeling tool.

My guess on Excel usage is that it is 10% of the time used for data analysis (pivot tables, time series analysis), 20% of the time used to create a simple table (planning for the week by non technical people), and 70% of the time used for modelling (business plan, tax calculations, accounting, pricing financial instruments, running an inventory, calculating something scientific/mathematic, etc). Most companies are run on Excel.

For instance one thing where Excel sucks at one of its core functionalities is linking a powerpoint presentation to an Excel model. Any consultant, expert, banker, accountant, salesperson or marketer will do that all day. You can paste a table as a linked image but the formatting will be completely unstable and it is very dodgy. And no way to link a number in a textbox to Excel. Microsoft should focus on these problems rather than yet another pivot table functionality.

Though office seems to be frozen time. I can't think of any major new feature since Office 2007.

It is damning that you cannot use a basic computer tool for anything that is large, complex, or important. Isn't that exactly what computers are for? Call me idealistic but I want a spreadhseet that can handle milions of rows. Maybe add a "strict" mode to keep people out of trouble.
You could have an option to disable any formatting in a tab, and then Excel should be able to handle a billion rows in that tab. But you can't have all the flexibility of Excel on a massive scale. Even a laptop today is insanely fast and has an insane amount of memory.
PowerPivot in Excel does allow you to have millions of rows in tables - mind you that isn't in the standard spreadsheet functionality, but it is there.
Excel is used HEAVILY at all levels of business for large, complex, and important things. I'm curious what this right tool is that they should be using instead.
Python with a good DAG library, e.g. https://github.com/manahl/mdf (but there are plenty of others kicking around).

What this lets you do is in Python build networks of computations then change one value and it will cascade through the rest of your model, exactly like you do in Excel, but in actual code that can easily be versioned, reused, etc.

Java, for example.

Yes, it's true Excel is used for a lot of big important things, but I'd agree with GP it's not very good at them. Moreover, I think most people who use Excel for big important things, would agree it's not good for it--they get stuck in that situation when a small spreadsheet grows big over the years. Lack of version control, limited modularity, and no ability to unit test make large spreadsheets error-prone, and Excel tends to barf on you when you push it too hard (seems to be a number of crash bugs and performance bugs that never get fixed).

Excel is incredibly productive for quick analysis and making charts. But it's really not a development platform and it's really not for heavy lifting.

The primary reason Java (and other programming languages) aren't, and won't be used instead of Excel is that Excel is interactive. About the closest you can get to that with a programming language is Lisp, but it's still half-way there compared to a spreadsheet, when you can see the data all the time as it is placed and updated.

The second big reason is that Excel just works, and all programming languages usually require installing tons of bullshit - editors, toolchains, whatever - to be able to work somewhat conveniently in them, and the moment any link in the bullshit chain crashes, you're SOL if you're not a developer. Yeah, we sometimes underestimate just how much minutiae knowledge we have that allows us to fix random problems in our tools without even thinking for a second about it.

So maybe Excel is unfit for purpose, but it's still the most fit for purpose tool available.

> About the closest you can get to that with a programming language is Lisp, but it's still half-way there compared to a spreadsheet, when you can see the data all the time as it is placed and updated.

I think that Kenny Tilton's Cells was an attempt to get that sort of interactive update functionality in Lisp. I was never able to figure it out though.

Using Java probably won't help you with the numerical issues though, unless you happen to use a good statistical library.
In fact a lot could be said of the millions of ways to shoot oneself in the foot with programming languages. Excel being visual (the formula is evaluated/tested immediatly after being entered), I'd argue a diligent user will make less mistakes with Excel than while coding.
This is true for small spreadsheets, and one reason why I love Excel for quick analyses. Debugging is nearly instant because you see the results as you go along. But it doesn't scale very well.

--If you want to re-use a formula, you basically have to copy-paste it. Generally there isn't much modularity. Programmers are well aware of the dangers of this.

--It's also not a friendly platform for writing regression tests (possible with VB, but who does that?).

--The two-dimensional nature of the spreadsheets and lack of loops means you often need to do hacky stuff to emulate multidimensional arrays or loops. Easy to make mistakes here.

> Excel being visual (the formula is evaluated/tested immediatly after being entered)

We generally call this "interactive" and not "visual", in order to avoid confusion with graphical programming languages.

I agree that being interactive probably helps rapid iteration, debugging, and experimentation. However, Excel could still be interactive without being so fast and loose with types, accurate display of numbers, and many of the other faults others have pointed out.

Also, if you're trying to make some subtle point about the reduced severity of mistakes, you mean "lesser" rather than "less", otherwise you mean "fewer". https://duckduckgo.com/?q=less+vs+fewer

If the programs contain errors, they are not useful. They appear to be useful to someone who just wants to see some activity instead of actual work.
All programs contain errors, and almost all output of calculations is approximate.

The question is if for most of the use cases the margin of error is just fine...

I can assure you that most "traditional" engineering (thinking mostly mechanical and civil) calculations, for things you use daily, were done in excel. Anything that would be a pit too much of pain to manually bang out on a calculator, but that doesn't really require complex logic (looping etc)? Excel.

Most engineers are more similar to business people in that regards, they see traditional programming languages as too complicated. We learned MATLAB in university, two classes, basic intro to programming and then numerical methods. Many people had to retake the first class at least once.

I'd love to bang out my calculations in IPython Notebooks, but the most important requirement of engineering calculations is that they can be documented and understood by peers. Since none of my peers are interested in learning python, it's useless. Excel is the lowest common denominator, everyone gets it.

I once read this interesting contrast on HN and it stuck with me:

"Traditional programming languages show the program but hide the data. Spreadsheets hide the program but show the data".

At university, we were forbidden from using Excel, and only FORTRAN 77 was allowed. At the time, this seemed strange and anachronistic, but when you consider that it had been in use for 25 years at the time, and was a known quantity that didn't have strange and unknown bugs, it actually seems sensible in retrospect.

I do however recall being mightily annoyed that I was having to calculate ephemera in "arcane bullshit", so I can understand why many shortcut to excel.

Experience makes fools of our past selves.

A company I worked for banned Excel for analysis and made everyone use Ipython notebooks because the analysis was transparent and very easily repeatable.
You can share your Excel files and macros with other people too. Python notebooks are kind of cool but if you need something quickly to sift through the data to have an idea, Excel is an amazingly good tool for it
I'm an Ecology student, and the last few days there's been a lot of (justified) criticism of Excel. I use R and Python for data analysis, but a lot of what I do is gathering data, inputting into Excel and then exporting this as a CSV for use in R/Python for statistical analysis - are there any other tools that can replace Excel for this step?
How do you gather the data? You could learn scraping in python, for example. If you manually input the data, a simple spreadsheet is fine for that.

If you have the data in a non csv format and use excel to transform it into csv, python and R have utilities for that job (python has pandas, I don't use R as much for data cleaning, but I know it can).

Yeah, it's a lot of manual entering of data (eg finding abundance of certain species of lichens, and lat/long, then manually entering all data and exporting this as a csv either for heatmaps with GIS or for more stats analysis and the same with animal species abundance) so it's good to know I'm not missing something fairly obvious!
Using unfit programs is a hard problem. Remember this thread?

- Ask HN: Suggestions for spacewalk procedure writing?

https://news.ycombinator.com/item?id=5585535

(It is asking for alternatives to Word for EVA planning)

.... and then we get into the whole realm of simple user errors - alarmingly easy to commit in an environment that positively encourages the production of a rats' nest of essentially unauditable calculations - then then feed into important financial decisions.
I have to agree, Excel and science just don't go with one another.