Hacker News new | ask | show | jobs
by cgranier 1338 days ago
Every new hire gets to watch this video (https://www.youtube.com/watch?v=0nbkaYsR94c) then we go through daily Excel tips and tricks until they're comfortable with it. I know Excel is not the right tool for everyone or everything, but the amount of useful work you can get done quickly IF you know a few basic Excel text manipulation functions is crazy.

I even have an Excel spreadsheet that helps me solve Wordle.

4 comments

Excel is great until it isn't. As one example, I've had grad students tell me about a surprising result they found when they did some simple descriptive statistics in Excel. When I ask them to show me, or to check something else, they can't replicate their finding. Then the inevitable "Oh, I grabbed the wrong column" or some such after ten minutes of playing around with it. The thing that makes Excel convenient but dangerous is the lack of guardrails.
I know there are better ways but this can/should be seen as part of the learning curve with Excel. The guardrail was seeing "surprising results" they should have known to double check their steps, see if they could recreate it, etc. before showing it to you. Not sure how you handle this but some times it's a hard lesson to learn. Eg. I've seen people convince a public company executives to do strategy X based on analysis Y. Then when things aren't going as promised I get asked to review the strategy, I might find some elaborate model where somewhere deep in the layers of math/logic/lookups, someone divided instead of multiplied or something like that. They then have egg on their face, the strategy may have been implemented already, and I usually have the uncomfortable job of explaining to management why the projections were off and the strategy is probably doomed in terms of reaching the expected outcomes (basically throwing somebody else under the bus due to an innocent math error). I see this so much in my job, it's actually routine for me or my team to review any numbers the company management will see before they see it because of things like this.
> Eg. I've seen people convince a public company executives to do strategy X based on analysis Y. Then when things aren't going as promised I get asked to review the strategy, I might find some elaborate model where somewhere deep in the layers of math/logic/lookups, someone divided instead of multiplied or something like that.

Is it Excel-specific though? I've made these kinds of errors with Excel (though not with stakes this high). I've also made them a lot when doing math on paper. And I've also made them in C++, Common Lisp, Matlab, Python, R and JavaScript. Now, with those other tools, it's easier to spot an error in a formula on review - but in Excel, it's easier to spot the intermediary results being off, so it's a wash.

I think the thing to learn is to be more careful, to sanity-check intermediary calculations; these kinds of errors are about being momentarily confused, and will happen regardless of the tool you use.

Yeah I agree with your conclusion and excel just happens to be the tool a majority of people are using so it’s a poster child of sorts. It also gives people “false” confidence in their ability at times when really they should probably ask for help. Using the reality of my team as an example again, a lot of people will just stay out of excel and ask us to build a model up front for a new strategy. My team isn’t flawless but much more experienced in the quirks and mental model of how to setup data for formulas, scenarios, etc. We also check each other’s work before something high stakes goes out of the team. Since we’re the experts it hurts us that much more when some miscalculation gets out and causes disruption.
> sanity-check intermediary calculations

I am very far from being an Excel guru, very far, but one thing I have found useful is having everything come out to an intermediate result, and calculate against those intermediate results. If you see weird numbers in the middle, then end result is likely wrong, figure out why your intermediate results are messed up. If you want to be fancy, you can even error bound your intermediates if you know your valid ranges.

When I was in grad school, I was doing some legal consulting on the side. A big part of one of the projects (a high nine-figure case) involved a bunch of Excel work. My boss wanted me at the trial with a laptop, Excel, and all the data just in case the other side revealed any errors in my results. He told me about other cases that were lost due to minor errors that, upon further review, had no meaningful impact on the analysis.

I did not believe Excel was the right tool for the job. As you might expect, I had high levels of stress until they announced a settlement right before the trial was set to begin. I later realized that I should have done everything elsewhere to confirm that my Excel results were correct.

You should have also hired an audit/accounting firm to vouch for the math. In some situations you can actually get them to be liable for the results being accurate.
It's a lot harder to check your work in a spreadsheet vs. some sort of scripting language like R/Python etc. By default what you did is hidden, and it's easy to grab the wrong row/column and not notice.

For a while Excel was messing up basic statistics functions too. Wouldn't be surprised if there was something else not quite right in there.

I’d say having a test suite available is the bigger advantage of something like python. Most of the big mistakes people make in Excel are logic mistakes and they could happen anywhere. It’s just plain old human error. But if you can run a test and it fails, then at least you know to go looking for the bug.

Either way you’re likely trusting that the underlying code is error free. But most people aren’t checking their python imports. They may or may not be aware of the problems floats can introduce. So on.

Excel forces you to break down and structure your tables in a meaningful way. I find it much easier to spot errors in than a script personally.

I think how confident you feel with excel will depend of what you do with it and how you were taught to use it. Years of consulting basically drilled into me that all my tables should be built as if they were going to be delivered to a client who will need to understand them at some point. If you lack structure however, it quickly devolves into chaos.

How do you use a spreadsheet in both an intuitive and a powerful manner at once?

For instance, you intuitively start with just a column of numbers, and then when you forget what they are you move the column down one and put a title above it, etc. Working this way I often end up with formulas that are off by one or two cells even though things should be adjusted, and it's hard to audit without mousing over the formulas and just looking.

If I'm trying to write a tool for others to use I'll do something more formal like putting the data on its own sheets and naming them. That helps a bit, but means that all my data is hidden on different sheets and I can't get a holistic view of the problem. And to be consistent in my formulas I'm tempted to use this model even for data that doesn't need it (not an array, for instance). And it still doesn't help for array-bounds type problems.

My short answer is to build up your intuition. This is not a static value. It can be improved.

I think everyone has slightly different workflows and this has no real solid answer. There’s always some trade offs. Just working in excel a lot helps, if you’re working on varying levels of complexity. I do it as a job and often helping others so I have a lot of exposure to different problems and input data types and even the desired outputs.

The intuitive example you mentioned never happens to me for example because of 2 things; 1) I intuitively leave space to add a header, it’s such a common thing to do, my intuition knows to account for it up front 2) if for some reason I ignored #1, I know how to move things without breaking the formula references and when to employ different approaches to that problem. Things like how copy and cut differ, Inserting a row above row 1, etc. One of my little hacks is avoiding referencing a range like A1:C5 and instead will make it A:C if it’s just a basic table of data. Your file may have a ton of references to this table of data once it’s built out and adding a row of data then requires some manual housekeeping which introduces an opportunity for a bug to occur. With my approach, I can add or subtract data or rows and non of the range references need to change. (Someone May point out that you could expand the table range for 6 rows in a way that the other references would expand, it’s true but I don’t design my files for that, because adding a row may have break something else to the right of column C).

A lot of the way i use and setup the data within a spreadsheet is in anticipation or avoidance of future issues. This only comes with experience. Which is kind of my answer to your more complex example. When developing for others, you’re usually trying to hide the data and complexity of things and expose only the useful bits for that end user. It’s not perfect and is very annoying that I have to hide and unhide things constantly to inspect the functionality. But that’s just the way it is. I’ve done some things like written little macros that hide all of the background stuff if a keyword is in the file name (imagine having a dev and prod version of the file, if prod is in the file name when saving all the cleanup and hiding of things happen. That’s what I distribute. The dev version is the same and where I work. When I save that file prod is not in the file name so everything in that macro doesn’t execute.)

https://www.cassotis.com/insights/88-of-the-excel-spreadshee...

"Considering the complexity of Excel spreadsheets and the lack of mastery by many users, we have come to the title of this post: more than 80% of the Excel spreadsheets have errors. Ray Panko, a University of Hawaii professor, discovered that, on average, 88% of the Excel spreadsheets have 1% or more errors in their formulas."

There was some research on Excel's impact on finance with so many errors, but I couldn't find the citation.

What percentage of in-house crud apps have errors?
No idea, but probably much less than 88%.
SQL constraints can prevent erroneous data. Excel does not guide a user to constraining their data to improve its accuracy.

https://www.forbes.com/sites/salesforce/2014/09/13/sorry-spr...

One of the most infamous excel errors even helped shaping global economic policy: https://www.uschamberfoundation.org/blog/post/did-excel-erro...
I once read somewhere, maybe here: "If I were to choose a single software to use for the rest of my life, it'd be Excel". I agree with that.
... until all your numbers become dates.
Or all your zip codes with leading zeros get silently truncated.
Or until all your 16 digit IDs get truncated...
... or until all your molecules become dates.
Why do I remember awk with that story?
Or your numbers are wrong because you didn't wrap in parentheses.

Example, put these two formulas in excel:

=(4/3 - 1)*3 - 1

=((4/3 - 1)*3 - 1)

The online version, at least, seems to be wrong when you do wrap it in parentheses:

=(4/3 - 1)*3 - 1 = 0

=((4/3 - 1)*3 - 1) = -2.22045E-16

Check out page 3-5 of this paper called How Futile are Mindless Assessments of Roundoff in Floating-Point Computation? [0]. It explains what is happening here better than I can and is where I got this example from.

[0]: https://people.eecs.berkeley.edu/~wkahan/Mindless.pdf

Or until your empty cells aren't truly empty, so you can't count the nonempty cells.
That reminds of a jokester who replaced all empty cells in small spread sheet with a space. Good lick figuring out why your 5 row table all of a sudden was in the giga bytes of size (obvious exageration for dramatic reasons). Was a good practical joke so!
I would choose a web browser.
Wouldn't python be a better answer, or is that cheating?
Emacs would be the real cheating answer.
Create a table in org mode, pass it to Common Lisp (or Perl or Scheme or R or whatever else floats your boat), and finally hand the thing over to graphviz for an in-line view of the results... and you can write up the relevant formulas with LaTeX.
Everything in Julia src-blocks please. Super comfy.
Is python really considered "software" in common parlance?

I think the comparable item would be something like VSCode.

>Every new hire

In what industry?

At the time we were doing online video distribution, which went from ingesting content from traditional sources (betacam, film, etc), editing, preparing distribution-ready videos, create metadata, upload to various systems... all the way to social media campaigns, etc.

Where/how does Excel knowledge factor into this?

Well, we managed a lot of data, mostly (but not only) CSV files. It's very useful to learn a few text manipulation functions in Excel when creating CSV files with sequential and or repetitive content (think TV episodes, etc). Or creating several flavors of CSV for each platform. Yes, a database backend with smart exporting functions might work well, but sometimes fast beats perfect, especially for one-off jobs.

Uploading thousands of videos into YouTube was much quicker with one or two CSV files. By learning some basic Excel, everyone was able to minimize errors and maximize output.

What else could we do with Excel? We could export XML files of our video edits from Premiere/FinalCut Pro, run them through a script into Excel and immediately get a report showing all the editing errors that still needed fixing (we had to edit the videos in a very particular way). This alone saved sooooo much time. Interestingly enough, we were also able to identify individual editors by the mistakes they made (it seems each one had a particular quirk).

I also ran the entire digitizing project in an Excel file, complete with burn charts and velocity calculations.

Over the years, I've received calls from every one of my employees, now on with their lives in other jobs, and one thing they're always grateful for are the Excel lessons.

And once you learn the logic behind building Excel functions and spreadsheets it opens your mind to other uses or more programming skills.

It's much easier to teach someone the power of a few choice Excel functions than to teach them Python from scratch. Plus you can see their eyes light up immediately. Fun times.

I think of Excel as great at prototyping stuff. At some point it is better to move your project to a more formal system and leave the warts and limitations of Excel behind. if your project is data heavy this will be a relational database. if your project is logic heavy this will be a programing language.
Exactly. But it is crazy how far you can take Excel (and even Google Sheets) with the proper organization.