Hacker News new | ask | show | jobs
by conductr 1338 days ago
I know there are better ways but this can/should be seen as part of the learning curve with Excel. The guardrail was seeing "surprising results" they should have known to double check their steps, see if they could recreate it, etc. before showing it to you. Not sure how you handle this but some times it's a hard lesson to learn. Eg. I've seen people convince a public company executives to do strategy X based on analysis Y. Then when things aren't going as promised I get asked to review the strategy, I might find some elaborate model where somewhere deep in the layers of math/logic/lookups, someone divided instead of multiplied or something like that. They then have egg on their face, the strategy may have been implemented already, and I usually have the uncomfortable job of explaining to management why the projections were off and the strategy is probably doomed in terms of reaching the expected outcomes (basically throwing somebody else under the bus due to an innocent math error). I see this so much in my job, it's actually routine for me or my team to review any numbers the company management will see before they see it because of things like this.
4 comments

> Eg. I've seen people convince a public company executives to do strategy X based on analysis Y. Then when things aren't going as promised I get asked to review the strategy, I might find some elaborate model where somewhere deep in the layers of math/logic/lookups, someone divided instead of multiplied or something like that.

Is it Excel-specific though? I've made these kinds of errors with Excel (though not with stakes this high). I've also made them a lot when doing math on paper. And I've also made them in C++, Common Lisp, Matlab, Python, R and JavaScript. Now, with those other tools, it's easier to spot an error in a formula on review - but in Excel, it's easier to spot the intermediary results being off, so it's a wash.

I think the thing to learn is to be more careful, to sanity-check intermediary calculations; these kinds of errors are about being momentarily confused, and will happen regardless of the tool you use.

Yeah I agree with your conclusion and excel just happens to be the tool a majority of people are using so it’s a poster child of sorts. It also gives people “false” confidence in their ability at times when really they should probably ask for help. Using the reality of my team as an example again, a lot of people will just stay out of excel and ask us to build a model up front for a new strategy. My team isn’t flawless but much more experienced in the quirks and mental model of how to setup data for formulas, scenarios, etc. We also check each other’s work before something high stakes goes out of the team. Since we’re the experts it hurts us that much more when some miscalculation gets out and causes disruption.
> sanity-check intermediary calculations

I am very far from being an Excel guru, very far, but one thing I have found useful is having everything come out to an intermediate result, and calculate against those intermediate results. If you see weird numbers in the middle, then end result is likely wrong, figure out why your intermediate results are messed up. If you want to be fancy, you can even error bound your intermediates if you know your valid ranges.

When I was in grad school, I was doing some legal consulting on the side. A big part of one of the projects (a high nine-figure case) involved a bunch of Excel work. My boss wanted me at the trial with a laptop, Excel, and all the data just in case the other side revealed any errors in my results. He told me about other cases that were lost due to minor errors that, upon further review, had no meaningful impact on the analysis.

I did not believe Excel was the right tool for the job. As you might expect, I had high levels of stress until they announced a settlement right before the trial was set to begin. I later realized that I should have done everything elsewhere to confirm that my Excel results were correct.

You should have also hired an audit/accounting firm to vouch for the math. In some situations you can actually get them to be liable for the results being accurate.
It's a lot harder to check your work in a spreadsheet vs. some sort of scripting language like R/Python etc. By default what you did is hidden, and it's easy to grab the wrong row/column and not notice.

For a while Excel was messing up basic statistics functions too. Wouldn't be surprised if there was something else not quite right in there.

I’d say having a test suite available is the bigger advantage of something like python. Most of the big mistakes people make in Excel are logic mistakes and they could happen anywhere. It’s just plain old human error. But if you can run a test and it fails, then at least you know to go looking for the bug.

Either way you’re likely trusting that the underlying code is error free. But most people aren’t checking their python imports. They may or may not be aware of the problems floats can introduce. So on.

Excel forces you to break down and structure your tables in a meaningful way. I find it much easier to spot errors in than a script personally.

I think how confident you feel with excel will depend of what you do with it and how you were taught to use it. Years of consulting basically drilled into me that all my tables should be built as if they were going to be delivered to a client who will need to understand them at some point. If you lack structure however, it quickly devolves into chaos.

How do you use a spreadsheet in both an intuitive and a powerful manner at once?

For instance, you intuitively start with just a column of numbers, and then when you forget what they are you move the column down one and put a title above it, etc. Working this way I often end up with formulas that are off by one or two cells even though things should be adjusted, and it's hard to audit without mousing over the formulas and just looking.

If I'm trying to write a tool for others to use I'll do something more formal like putting the data on its own sheets and naming them. That helps a bit, but means that all my data is hidden on different sheets and I can't get a holistic view of the problem. And to be consistent in my formulas I'm tempted to use this model even for data that doesn't need it (not an array, for instance). And it still doesn't help for array-bounds type problems.

My short answer is to build up your intuition. This is not a static value. It can be improved.

I think everyone has slightly different workflows and this has no real solid answer. There’s always some trade offs. Just working in excel a lot helps, if you’re working on varying levels of complexity. I do it as a job and often helping others so I have a lot of exposure to different problems and input data types and even the desired outputs.

The intuitive example you mentioned never happens to me for example because of 2 things; 1) I intuitively leave space to add a header, it’s such a common thing to do, my intuition knows to account for it up front 2) if for some reason I ignored #1, I know how to move things without breaking the formula references and when to employ different approaches to that problem. Things like how copy and cut differ, Inserting a row above row 1, etc. One of my little hacks is avoiding referencing a range like A1:C5 and instead will make it A:C if it’s just a basic table of data. Your file may have a ton of references to this table of data once it’s built out and adding a row of data then requires some manual housekeeping which introduces an opportunity for a bug to occur. With my approach, I can add or subtract data or rows and non of the range references need to change. (Someone May point out that you could expand the table range for 6 rows in a way that the other references would expand, it’s true but I don’t design my files for that, because adding a row may have break something else to the right of column C).

A lot of the way i use and setup the data within a spreadsheet is in anticipation or avoidance of future issues. This only comes with experience. Which is kind of my answer to your more complex example. When developing for others, you’re usually trying to hide the data and complexity of things and expose only the useful bits for that end user. It’s not perfect and is very annoying that I have to hide and unhide things constantly to inspect the functionality. But that’s just the way it is. I’ve done some things like written little macros that hide all of the background stuff if a keyword is in the file name (imagine having a dev and prod version of the file, if prod is in the file name when saving all the cleanup and hiding of things happen. That’s what I distribute. The dev version is the same and where I work. When I save that file prod is not in the file name so everything in that macro doesn’t execute.)