| I have a habit of taking notes on these types of talks when I stumble upon them, so for anyone interested, here's a quick write up of the video. Some of it is paraphrased,. * "[Microsoft estimates that 750M users of Excel. (%7 of the world population).]" * Spreadsheet errors basically ruined the economy of Greece. * "State of the art" is manually double checking your formulas. This is what the experts suggest… * There's apparently an article out there from Forbes titled "Sorry, Your Spreadsheet Has Errors (Almost 90% Do)". * Thomas Herndon is the guy that did manual spreadsheet verification to prove that there were errors. * Talks about CheckCell, ExceLint. * Input errors are a huge problem: "Roughly 1% of characters people mistype." * "[1 out of 20 cells manually typed probably has an error.]" (Woah.) * "[Users often add a digit or remove a digit, changing the order of magnitude]" * 1) Manual data entry is hard to do correctly, 2) Writing formulas/code/Excel that uses that data is also hard to do correctly. * One take away: Like code, if you're not testing it manually, and no one is testing it for you manually, and you're not writing tests, and the results aren't "gut-checked" or the results aren't used, why would it be correct? If a tree falls… * "[A lot of public posted Excel sheets are filled with errors, or fudging.]" Look at the grades one that he shows around 19:36 to see what I mean. * "The Bootstrap" - stats analysis using simulations. "[Resample samples]... random sample with replacement, repeatedly, to get distribution of output of calculation." Requires a homogenous range. Allows you to find "outliers" that drastically change the output. "What is the likelihood of observing one of the simulations under the null hypothesis, and if it's below [X] then we say it's unusual." Dude in audience at 30:29 describes it well. * Formulas are easier to audit because they're usually named w/ column, etc. Data is hard. * Goes through a long process of describing how they gather data, etc. Good stuff, but the short and the long of it is CheckCell is good. * Loops back around to the global finance sheet that had a lot of errors: CheckCell worked on it. * ExceLint - static analysis, ranks errors and their fixes. Can find off-by-one-like errors. Formulas using off-by-one ranges, etc.Excel has its own error finder, but it gives a lot of false positives and false negatives."Most errors are reference errors" - wrong row, wrong column, too short a range, too long a range, etc. "Looks for disruptions in rectangular regions." Not just outliers. Looking for irregularity, where regularity is basically low entropy. "[Capture the relationship of cells/ranges and their relationship to one another.]" Looking for relationships that minimize entropy ("[Because users aren't insane and they're putting things in a rectangular grid.]" Looks for every rectangle (i.e. range) that when merged with a neighbor, would remain rectangular. That is considered a potential fix. Then you can simulate the fix as if you already did it, and check the entropy on that. * A lot of the errors, and their origins have to do with basic Excel features. Some of these features were outlined as best practices in Joel's "You suck at Excel" talk, which is kinda funny. Great power, Uncle Ben, etc. * Dropped this one: "SUM is [something like 45% of formulas]". (edit: spacing out list) |