Hacker News new | ask | show | jobs
by cosmie 2312 days ago
> because spreadsheets are impossible to scale, which in turn makes the processes that rely on them impossible to scale.

Spreadsheets are entirely possible to scale, that just tends to be a skillset in and of itself, and domain experts organically creating a complex spreadsheet likely don't have the background in process engineering and software design principles to do so themselves.

Generally speaking, you can usually refactor an unmaintainable and complex spreadsheet to mimic software engineering best practices, all without dropping down to VBA. Leveraging named ranges[1], locked cells[2] and formulas[3], data validation[4], and error handling[5]. Combined with some defensive validation checks, you can generally sort out the complexity issue nicely.

Additional enhancements (based on Excel 2010+) can be made using tables and structured references[6], factoring out "data" worksheets into their own workbooks[7] and linking to them from the "user" workbook, adding relational integrity via a data model[8], or scaling data size via PowerQuery[9] (which stores data within the Excel file in a highly compressed, columnar format that transparently gets processed by a local instance of the same VertiPaq engine[10] that powers SQL Server Analysis Services)

You can also drop down to VBA or Javascript[11] if you truly want/need to jump out of the rails of the built in options above. Or in more common cases (which leads to the hell-to-maintain spreadsheets that are more common), if you want to bypass all of the nifty built-in functionality above and do something quick-n-dirty. But if you leverage the above capabilities, you can mature a spreadsheet-based solution quite well and have a battle-tested, stable PoC that can be handed off to a software developer for migrating into a more permanent application.

[1] https://trumpexcel.com/named-ranges-in-excel/

[2] https://support.office.com/en-us/article/lock-cells-to-prote...

[3] https://support.office.com/en-us/article/display-or-hide-for...

[4] https://support.office.com/en-us/article/more-on-data-valida...

[5] https://www.exceltactics.com/definitive-guide-excel-error-ty...

[6] https://support.office.com/en-us/article/overview-of-excel-t...

[7] https://www.microsoftpressstore.com/articles/article.aspx?p=...

[8] https://support.office.com/en-us/article/create-a-data-model...

[9] https://en.wikipedia.org/wiki/Power_Pivot

[10] https://www.microsoftpressstore.com/articles/article.aspx?p=...

[11] https://docs.microsoft.com/en-us/office/dev/add-ins/excel/ex...

2 comments

Good point. I've seen miracles occur simply by refactoring an Excel workbook such that the inputs, outputs, and calculations to map the inputs to the outputs were all delineated into their own clearly-formatted bounded context. And also, when you do drop into vba code, simply organizing the code so that interactions with the sheet are easier to understand, and isolated -- and focusing on "seriality" (no feedback loops, minimize code triggered by events) can transmogrify vba code from the Bogeyman into a friendly neighbor.
For sure with the transmogrification! If the rest of your workbook is properly designed, dropping into VBA for some array operations and direct references[1] can drastically improve performance as opposed to per-cell calculations and looping. And can cleanly compartmentalize functionality in a way that can be refactored into other languages during post-spreadsheet migration as needed.

Although if it fits your usage, PowerQuery and M[2] can be even more performant, if for no other reason than the data being in a more efficient/compressed format. With the nice side effect of creating logic that's transferable as-is from Excel to PowerBI or SQL Server Analysis Services (making for a clean migration path as your solution matures).

[1] https://www.microsoft.com/en-us/microsoft-365/blog/2009/03/1...

[2] https://docs.microsoft.com/en-us/powerquery-m/

You haven't seen anything until you have seen large process plant importing and exporting data from Excel via OPC to allow the non programming process engineers to tweak performance.

Really not the intended use or appropriate criticality management of Excel as a tool, let alone the O/S it runs on, yet it is done.